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A  terminological  knowledge  representation  system  is  developed  which  provides  a 
knowledge- based  approach  to  information  retrieval.  A  theory  of  categorization  pro- 
vides a  unifying  theme  for  information  organization  and  data  modeling.  The  theory 
of  categorization  is  based  on  a  tradeoff  between  reasoning  about  empirical  obser- 
vations (case-based  reasoning),  and  reasoning  by  using  abstract  cognitive  models 
(explanation-based  learning).  The  resulting  category  structure  accounts  for  several 
important  category-related  phenomena  such  as  family  resemblance,  prototype  effects, 
and  default  reasoning.  The  category  theory  was  incorporated  into  the  design  of  a 
terminological  knowledge  representation  system.  This  system  provides  a  number  of 
reasoning  capabilities.  In  addition  to  the  Classification  procedures  and  Subsumption 
functions  provided  by  other  terminological  knowledge  representation  systems,  some 
new  features  are  introduced.  Intersection  is  a  function  that  compares  two  instances 
and  generates  new  class  descriptions  based  on  similarity  of  features.  Evolution  is  a 
function  that  alters  existing  class  structures  to  accommodate  exceptions.  Exceptions 
are  identified  by  an  Exception  Condition.  These  reasoning  capabilities  are  apphed  in 
a  conceptual  clustering  algorithm  for  semiautomatic  generation  of  database  schema 
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and  to  new  query  specification  and  processing  techniques.  Finally,  natural  language 
processing  is  closely  coupled  with  the  data  model  and  query  processing.  The  theory  of 
categorization  is  directly  related  to  a  theory  of  word  meaning.  A  new  design  for  repre- 
senting lexical  knowledge  is  presented,  one  which  facilitates  lexical  acquisition.  Some 
relationships  between  natural  language  and  qualitative  simulation  are  also  presented. 
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CHAPTER  1 
INTRODUCTION 


An  information  retrieval  system  must  deal  with  large  volumes  of  heterogeneous 
information  in  a  variety  of  media.  That  is,  an  information  retrieval  system  necessar- 
ily must  deal  with  broad  domains,  and  must  be  capable  of  closely  integrating  many 
diverse  subject  areas.  The  system  will  not  be  confined  to  text  or  numerical  data,  but 
must  also  include  multimedia  capabilities.  The  information  can  take  the  form  of  com- 
puter graphics,  many  kinds  of  images,  sounds,  and  a  variety  of  complex  apphcations 
such  as  expert  systems  and  computer  simulations.  Information  retrieval  systems  are 
especially  interesting  because  they  represent  the  form  that  libraries  will  take  in  the 
future.  Such  libraries  will  provide  many  services  to  users.  One  service  will  be  to 
understand  the  user's  questions  and  directly  retrieve  and  summarize  the  information 
relevant  to  these  questions. 

The  work  below  is  motivated  by  many  years  of  effort  in  building  working  infor- 
mation retrieval  systems  designed  for  a  wide  audience.  After  working  with  conven- 
tional information  retrieval  technologies  (keyword  searching  techniques,  hypertext, 
and  videotex),  it  is  clear  that  the  information  retrieval  problem  is  far  from  solved. 
Whether  or  not  the  user  is  knowledgeable  about  computers  (the  average  user  is  not), 
it  is  very  difficult  to  specify  a  simple  request  for  information  and  get  a  direct  and  rele- 
vant answer.  The  current  interfaces  are  limited  to  simple  browsing  via  menu  selection 
or  word-oriented  searching.  The  consequence  is  that  the  user  often  misses  important 
information,  and  usually  retrieves  large  quantities  of  irrelevant  information.  The  work 
below  proposes  some  new  approaches  to  information  retrieval  that  attempt  to  solve 
some  of  these  problems.  It  wa^  influenced  by  a  number  of  axea^: 
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•  After  abandoning  traditional  approaches  to  information  retrieval,  a  knowledge- 
based  approach  was  adopted.  Structural  knowledge  representation  techniques 
showed  promise  for  organizing  the  information,  and  inferencing  techniques  for 
manipulating  structures  showed  promise  for  supporting  more  precise  queries. 
Structural  knowledge  representation  techniques  include  frames,  objects,  and  se- 
mantic networks  which  characterize  the  structure  of  broad  domains  of  knowledge 
and  the  interrelationships  among  data  items.  Terminological  knowledge  repre- 
sentation systems  are  a  special  class  of  such  systems.  In  contrast,  a  rule-based 
approach  was  rejected"  because  it  cannot  represent  this  structure. 

•  In  parallel  with  structural  knowledge  representation  techniques,  the  database 
discipline  has  produced  newer,  richer  data  models  known  as  semantic  data  mod- 
els. Closely  related  to  this  is  the  emergence  of  object-oriented  databases.  It  was 
concluded  that  these  new  models  are  essentially  indistinguishable  from  struc- 
tural knowledge  representation.  Yet,  database  management  has  much  to  offer, 
especially  in  the  area  of  efficient  information  access  and  update  via  query  pro- 
cessing and  optimization.  Thus,  a  merging  of  artificial  intelligence  and  database 
management  is  beginning  to  unfold. 

•  In  spite  of  the  enormous  difficulties,  it  was  decided  that  natural  language  pro- 
cessing offered  the  only  solution  to  querying  large  information  retrieval  systems. 
Although  natural  language  may  seem  an  unlikely  choice,  there  are  in  fact  no 
other  alternatives.  On  the  contrary,  it  was  found  that  natural  language  is  funda- 
mentally related  to  the  organization  and  retrieval  of  information.  It  was  quickly 
recognized  that  the  structural  knowledge  representation  techniques  used  to  or- 
ganize the  database  provided  an  excellent  platform  for  building  natural  language 
processors.  From  then  on,  the  problem  of  organizing  databases  and  the  problem 


of  representing  language  were  inseparable. 

•  There  were  many  contributions  from  fields  outside  computer  science  that  have 
a  direct  bearing  on  this  work.  The  information  retrieval  problem  is  best  viewed 
as  a  cognitive  science  problem.  Much  work  from  developmental  psychology  and 
philosophy  is  relevant. 

Category  theory  provides  a  unifying  theme  for  structural  knowledge  representa- 
tion, data  modehng,  and  natural  language.  Categorization  is  the  process  of  grouping 
entities  into  categories  or  classes.  It  is  proposed  that  terminological  knowledge  rep- 
resentation systems  can  be  improved  by  incorporating  modern  theories  of  categoriza- 
tion. 

The  main  contributions  of  this  dissertation  are  summarized  as  follows: 

1.  A  knowledge  representation  system  is  presented  that  provides  a  more  accurate 
view  of  categories  by  combining  case-based  and  explanation-based  reasoning. 

2.  The  data  modeling  problem  is  viewed  more  generally  as  a  problem  of  catego- 
rization. The  result  is  an  improvement  in  database  design  and  query  processing. 
Some  techniques  from  machine  learning  (from  case-based  and  explanation-based 
reasoning)  are  apphed  to  data  modehng  for  the  first  time. 

3.  The  results  of  one  and  two  are  used  as  a  basis  for  natural  language  processing, 
particularly  in  the  area  of  lexicon  design  and  automatic  lexical  acquisition. 

4.  The  results  of  one,  two,  and  three  have  practical  apphcation  in  providing  a 
knowledge-based  approach  to  information  retrieval  and  to  dynamic  modeling 
and  qualitative  simulation. 
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It  is  argued  that  categories  are  not  adequately  treated  by  existing  structural 
knowledge  representation  techniques.  Thus,  a  contribution  is  made  to  knowledge 
representation.  In  particular,  a  class  of  systems  known  as  terminological  knowledge 
representation  systems  are  thoroughly  evaluated,  and  major  changes  are  presented. 
The  result  is  a  better  representational  account  of  categories. 

At  the  same  time,  current  theories  of  categorization  are  highly  fragmented.  That 
is,  individual  theories  present  only  a  part  of  the  picture.  An  attempt  is  made  at 
integrating  a  number  of  theories  and  experimental  results  into  a  complete  package. 

In  natural  language  processing,  a  theory  of  word  meaning  is  presented  based  on 
new  theories  of  categorization.  This  leads  to  a  new  design  for  representing  lexical 
knowledge.  The  theory  provides  for  automating  the  process  of  lexical  acquisition.  A 
complete  integration  of  natural  language  processing  and  database  query  processing  is 
achieved. 

Finally,  the  theoretical  results  are  directly  applicable  to  solving  problems  originally 
raised  in  information  retrieval.  Because  of  a  special  interest  in  computer  simulation, 
the  applicability  of  the  results  to  modeling  are  also  discussed  in  detail. 

The  following  points  summarize  the  approach  taken: 
Terminological  Knowledge  Representation: 

•  For  the  most  effective  approach  to  information  storage  and  retrieval,  structured 
knowledge  representation  techniques  (objects)  must  be  completely  integrated 
with  semantic  data  modehng.  In  particular,  terminological  knowledge  repre- 
sentation systems  which  provide  formal  semantics  and  inferencing  operations 
will  be  used  as  the  modeling  language.  These  systems  also  provide  a  way  of 
representing  word  meaning,  thus  leading  to  a  tight  coupling  of  natural  language 
processing  with  information  retrieval. 
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•  Terminological  knowledge  representation  systems  incorporate  inferencing  tech- 
niques based  on  term  subsumption.  These  techniques  (subsumption  and  clas- 
sification) can  be  used  as  a  new  way  of  querying  databases.  In  contrast  to 
algebra-based  query  languages,  querying  based  on  terminological  reasoning  ex- 
ploits the  domain  semantics  inherent  in  the  data  model. 

Influence  of  Category  Theory: 

•  Current  data  models  and  leading  terminological  knowledge  representation  sys- 
tems are  flawed  because  they  promote  an  antiquated  notion  of  categories.  All 
these  systems  attempt  to  group  particular  instances  into  categories,  or  classes. 
Instead,  building  these  systems  to  incorporate  modern  principles  of  categoriza- 
tion will  lead  to  more  realistic  representations  and  more  powerful  inferencing 
capabilities. 

•  Categories  are  complex  clusters  of  concepts.  Categories  cannot  be  formed  on 
the  basis  of  simple  predicates  or  necessary  and  sufficient  conditions  that  are  true 
of  all  the  members  of  the  category.  In  addition,  categories  exhibit  properties 
such  as  prototype  eflFects,  default  values,  exceptions,  and  evolution  which  must 
be  accounted  for  in  a  theory  of  categorization. 

•  A  modern  view  of  categories  is  based  on  several  principles  including  family  re- 
semblance, case-based  reasoning,  cognitive  models,  explanation-based  learning, 
and  symbol  grounding. 

•  Cognitive  models  play  an  important  role  in  categorization  and  language  un- 
derstanding. Cognitive  models  can  take  a  variety  of  forms.  One  form  is  the 
traditional  database  class  object.  Another  promising  approach  is  the  use  of 
qualitative  simulation  which  provides  the  ability  to  reason  about  dynamic  pro- 
cesses. 


Conceptual  Clustering: 

•  A  conceptual  clustering  algorithm  is  presented  that  incorporates  these  prin- 
ciples. The  algorithm  can  detect  and  accommodate  exceptions.  However,  the 
algorithm  is  semi-automatic  in  that  it  requires  human  interaction  to  develop  so- 
cially consistent  categories.  It  is  argued  that  a  completely  automatic  algorithm 
is  not  possible. 

•  The  conceptual  clustering  algorithm  works  on  a  combination  of  case-based  rea- 
soning and  explanation-based  learning.  Case-based  reasoning  compares  in- 
stances on  the  basis  of  structural  similarity.  New  class  descriptions  can  be 
generated  that  group  together  instances  having  similar  features.  At  the  same 
time,  classes  can  be  constructed  on  the  basis  of  cognitive  models,  which  include 
database  schemas  as  a  special  case.  Instances  can  belong  to  a  class  to  the  extent 
that  they  conform  to  a  cognitive  model  associated  with  a  class.  Cognitive  mod- 
els are  contained  within  a  class  description.  Explanation-based  learning  is  used 
to  justify  class  membership  of  an  instance  to  the  extent  that  it  conforms  to  the 
model.  Finally,  category  structure  evolves  as  a  result  of  exception  handhng.  An 
instance  is  an  exception  to  a  class  if  it  belongs  to  the  class,  yet  fails  to  satisfy 
cognitive  models  associated  with  the  class.  Class  structure  must  be  altered  if 
an  exception  must  be  accepted  into  a  class. 

Natural  Language  Processing,  Lexical  Acquisition: 

•  A  theory  of  meaning  is  contingent  on  this  category  theory.  Words  in  natural 
language  are  associated  with  categories.  A  system  that  can  build  categories  also 
provides  the  capability  for  representing  word  meaning  in  a  realistic  fashion. 

•  Machine  learning  is  an  essential  part  of  the  information  system.  The  lexical 
acquisition  problem  can  be  addressed  by  using  the  conceptual  clustering  algo- 
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rithm.  A  natural  language  processing  system  must  have  the  ability  to  adapt  to 
new  usage  and  unfamihar  words.  It  is  shown  how  a  lexical  acquisition  algorithm 
can  be  implemented  using  the  categorization  theory  presented. 

Research  Method: 

•  This  project  was  approached  as  a  multidisciplinary  apphcation  of  cognitive  sci- 
ence. The  problems  being  addressed  here  are  of  interest  to  psychology,  philoso- 
phy, and  linguistics  as  well  as  artificial  intelligence  and  database  management. 
Studying  all  these  areas  not  only  identified  common  interests,  but  resulted  in 
valuable  cross-fertilization.  For  example,  the  important  concepts  of  categoriza- 
tion are  rarely  discussed  in  the  artificial  intelligence  and  database  literature. 
Most  of  the  concepts  on  categorization  presented  here  come  from  cognitive 
psychology,  with  parallel  arguments  discussed  in  analytical  philosophy  and  phi- 
losophy of  language.  However,  these  disciplines  lack  the  formal  techniques  and 
algorithms  that  computer  science  provides. 

Chapter  2  presents  the  CANDIDE  semantic  data  model  and  terminological  knowl- 
edge representation  system  which  is  used  as  a  knowledge  representation  and  data 
modeling  language.  The  results  of  early  work  involving  the  use  of  classification  and 
subsumption  as  query  processing  techniques  are  discussed.  The  first  attempts  at  fully 
integrating  natural  language  processing  with  the  database  are  also  presented.  This 
early  work  resulted  in  the  identification  of  a  number  of  major  problems  with  current 
approaches  to  terminological  reasoning.  These  problems  are  discussed  in  order  to  set 
the  stage  for  the  new  approach  presented  in  subsequent  chapters. 

Chapter  3  presents  the  main  thesis:  the  representation  of  meaning  should  be  based 
on  a  modern  view  of  categorization.  Data  models,  structural  knowledge  representa- 
tion systems,  queries,  and  natural  language  processing  are  all  related  by  a  theory 
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of  categorization.  This  chapter  discusses  the  history  of  category  theory,  as  well  as 
the  details  of  the  modern  theory.  Relevant  work  in  psychology  and  philosophy  is 
discussed  in  addition  to  contributions  from  computer  science. 

Chapter  4  presents  the  conceptual  clustering  algorithm.  This  algorithm  groups 
CANDIDE  instances  into  classes  in  a  semi-automated  fashion.  Several  new  opera- 
tions are  introduced.  A  function  called  INTERSECT  determines  what  if  anything 
two  instances  have  in  common.  An  Exception  Condition  determines  necessary  con- 
ditions for  identifying  instances  which  are  exceptions  to  existing  class  descriptions. 
Evolution  provides  a  way  of  modifying  class  descriptions  to  accommodate  exceptions. 
Prototypes  and  default  values  are  determined  by  reasoning  over  sets  of  instances.  A 
review  of  the  literature  on  conceptual  clustering  is  also  presented. 

Chapter  5  shows  how  the  conceptual  clustering  algorithm  is  applied  to  natural 
language  processing.  Evidence  for  category  effects  at  the  level  of  phonetics,  morphol- 
ogy, syntax,  and  semantics  is  presented.  The  Subsumption  and  Intersection  functions 
are  illustrated  in  detail  at  the  syntactic  level  for  unification  grammar  formalisms. 
The  lexical  acquisition  problem  is  introduced  by  illustrating  the  role  of  Exception 
handling,  Evolution,  and  default  reasoning.  A  lexical  acquisition  algorithm  is  then 
presented,  followed  by  a  detailed  example. 

Chapter  6  illustrates  the  importance  of  cognitive  models  in  language  understand- 
ing. In  particular,  the  use  of  qualitative  simulation  techniques  as  cognitive  models  is 
discussed  with  several  examples.  The  integration  of  language  processing  with  simu- 
lation techniques  leads  to  improved  language  understanding. 

Chapter  7  summarizes  conclusions  and  identifies  future  research  directions. 


CHAPTER  2 
THE  CANDIDE  SEMANTIC  DATA  MODEL 

2.1    Introduction 


This  chapter  introduces  the  CANDIDE  semantic  data  model.  CANDIDE  serves 
several  purposes.  It  is  used  both  as  a  data  model  and  a  structural  knowledge  repre- 
sentation language  (there  is  no  need  to  distinguish  between  a  database  and  a  knowl- 
edge base).  CANDIDE  is  a  notation  for  building  structural  descriptions  of  concepts 
and  objects.  In  addition,  several  inferencing  operations  are  available  to  manipulate 
these  descriptions.  A  database  management  system  based  entirely  on  CANDIDE  has 
also  been  implemented,  that  is,  information  can  be  stored  in  a  CANDIDE  database. 
CANDIDE  has  been  used  successfully  to  model  several  different  domains. 

CANDIDE  originated  from  a  class  of  systems  known  as  terminological  knowledge 
representation  systems  (also  called  terminological  logics,  terminological  reasoners,  or 
term  subsumption  languages).  The  first  such  system  was  KL-ONE  [9],  and  subse- 
quently a  long  series  of  systems  evolved.  Terminological  knowledge  representation 
systems  are  unique  in  that  1)  they  are  formal  languages  with  well-defined  syntax 
and  semantics,  and  2)  they  provide  inferencing  operations  that  can  manipulate  ob- 
ject structures.  In  these  systems,  objects  represent  terminology.  Thus,  these  systems 
provide  an  initial  basis  for  representing  and  reasoning  about  terminology  and  word 
meaning.  In  contrast,  other  representation  techniques  such  as  formal  logic  and  pro- 
duction rules  cannot  exphcitely  represent  the  structural  and  terminological  relation- 
ships inherent  in  databases. 

This  chapter  reports  prehminary  work  which  was  done  with  CANDIDE.  The  pre- 
liminary work  resulted  in  a  new  approach  to  handhng  database  queries  and  integrating 
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natural  language  processing  with  databases.  The  use  of  classification  and  subsump- 
tion,  two  inferencing  operations  provided  by  CANDIDE,  to  process  database  queries 
will  be  discussed  in  detail.  A  novel  feature  of  this  approach  is  that  the  data  definition 
language  (DDL)  and  data  manipulation  language  (DML)  are  identical,  thus  providing 
uniform  treatment  of  data  objects,  query  objects,  and  view  objects.  The  classification 
algorithm  finds  the  correct  placement  for  a  query  object  in  a  given  object  hierarchy. 
The  fundamental  criterion  for  such  correct  placement  is  the  subsumption  relationship 
between  two  object  classes.  Tractability  issues  are  explored,  and  the  expressiveness 
of  queries  is  compared  with  relational  algebra.  It  is  then  shown  how  natural  language 
queries  can  be  directly  integrated  with  these  query  processing  techniques. 

This  prehminary  work  identified  many  major  shortcomings  in  existing  termino- 
logical knowledge  representation  systems.  These  problems  have  to  do  with  how  these 
systems  represent  categories.  The  problems  resulted  in  a  reformulation  of  terminolog- 
ical knowledge  representation  in  terms  of  new  theories  of  categorization.  This  chapter 
describes  the  background  research  which  led  to  this  reformulation.  The  main  themes 
of  data  modehng,  query  processing  based  on  terminological  reasoning,  and  integrated 
natural  language  processing  are  introduced. 

2.2   Overview  of  the  CANDIDE  Data  Model 

CANDIDE  is  derived  from  the  terminological  knowledge  representation  systems 
FL-  [65],  KANDOR  [86],  and  BACK  [83].  Extensions  were  made  to  these  languages  to 
make  them  more  suitable  for  data  modeling.  A  notation  was  developed  for  CANDIDE 
which  conforms  to  more  common  data  modehng  terms.  Also,  additional  constructs 
have  been  added  to  simphfy  representing  standard  data  types.  For  example,  the  type 
constructors  RANGE,  SET,  and  COMPOSITE  domain  have  been  added.  These  do 
not  really  increase  the  expressiveness  of  the  model  or  affect  computational  complexity, 
but  they  simphfy  the  apphcability  of  the  model.  The  same  concepts  could  be  repre- 
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sented  in  the  original  models,  but  with  great  difficulty.  For  example,  in  the  original 
KANDOR,  the  concept  of  "21  or  older"  would  have  to  be  represented  by  creating  one 
instance  of  age  for  each  integer  from  21  up  to  some  arbitrarily  large  value,  and  placing 
them  all  under  a  special  class.  With  the  extensions,  this  can  be  handled  by  simply 
using  the  "RANGE  [21,NIL"  construct.  Care  has  been  taken  not  to  introduce  any 
constructs  that  would  affect  computational  complexity.  This  was  done  by  requiring 
any  new  construct  to  be  reducible  to  the  original  constructs.  These  extensions  make 
it  easier  to  state  queries,  as  will  be  illustrated  in  Section  2.4.  This  section  gives  a  brief 
overview  of  the  CANDIDE  data  model.  One  can  see  that  it  is  at  least  as  expressive 
as  most  semantic  data  models. 
2.2.1    Structural  Aspect  of  the  Data  Model 

CANDIDE  models  structured  objects  as  classes  and  instances.  Classes  represent 
generic  concepts,  and  instances  represent  particular  occurrences  of  a  concept.  In- 
stances have  associated  properties  represented  by  attributes.  The  BNF  of  the  CAN- 
DIDE data  model  is  shown  in  Figure  2.1.  The  model  explicitly  supports  the  abstrac- 
tions of  aggregation,  generalization  [110],  identification,  and  classification  [89]  (not 
to  be  confused  with  the  classification  algorithm  described  below).  The  association 
abstraction  [10]  can  be  treated  as  a  special  case  of  aggregation,  but  is  not  explicitly 
supported.  An  association  between  two  or  more  object  classes  is  modeled  as  a  class 
which  is  the  aggregation  of  attributes  referencing  each  member  class  (of  the  associ- 
ation), in  addition  to  other  attributes  which  the  association  may  have.  The  model 
has  four  semantic  categories:  classes,  attributes,  instances  and  disjoint  classes.  The 
literals  (  classname  ),  {  attr-name  ),  (  inst-name  )  and  (  disjoint-name  )  are  strings 
that  uniquely  identify  objects  in  each  category.  Strings,  integers,  and  real  numbers 
are  system  built-in  atomic  types  or  classes. 

The  database  schema  consists  of  two  hierarchies  (Figure  2.2),  one  for  classes  and 
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the  other  for  attributes.  The  root  of  the  class  hierarchy  is  a  universal  class  called 
"Thing."  In  the  class  hierarchy,  a  class  may  have  more  than  one  parent.  In  the 
attribute  hierarchy,  which  also  has  a  root  called  "Thing,"  each  attribute  can  have  at 
most  one  parent  attribute  along  with  an  associated  domain.  This  domain  is  either 
an  instance,  or  a  set  of  instances  typified  by  an  object  class  described  in  the  class 
hierarchy.  The  domain  of  an  attribute  must  be  a  subclass  of  the  domain  of  its  parent 
attribute.  A  separate  attribute  hierarchy  also  gives  the  user  more  flexibihty  in  object 
or  query  specification,  as  will  be  illustrated  in  Section  2.4. 

An  attribute  appearing  in  a  class  description  can  be  qualified  by  additional  value 
constraints  on  its  domain  specified  within  the  class  description.  Further,  these  at- 
tribute constraints  in  a  class  must  logically  imply  the  constraints  on  each  attribute  of 
each  of  its  superclasses.  They  also  specify  requirements  for  instances  to  be  members 
of  the  class.  An  object  class  can  have  many  superclasses,  subclasses,  and  instances. 
A  disjoint  class  means  that  the  named  subclasses  of  a  given  class  cannot  have  any 
common  instance. 

A  class  can  be  either  primitive  or  defined  [9].  Primitive  classes  represent  con- 
cepts that  cannot  be  fully  specified,  that  is,  the  attribute  constraints  are  necessary 
but  not  sufficient  conditions  for  class  membership.  They  generally  occur  at  the  top 
of  a  hierarchy.  There  are  two  interesting  kinds  of  primitive  classes.  The  first  kind 
includes  concepts  such  as  Schank's  conceptual  dependencies  [101],  "action,"  "agent," 
"object,"  which  are  used  to  define  other  classes  but  cannot  be  defined  themselves. 
Second,  there  are  concepts  which  cannot  be  expressed  within  the  modeling  constructs 
provided  by  CANDIDE,  and  these  must  be  treated  as  primitive  classes.  For  example, 
a  polygon  can  be  described  as  a  set  of  line  segments,  but  it  may  not  be  possible  to 
express  the  requirements  of  closure  and  non-intersection  on  this  set.  Thus,  a  user  is 
forced  to  declare  a  polygon  as  a  primitive  class.  Users  must  explicitly  specify  which 


13 

subclasses  and  instances  belong  to  a  given  primitive  class. 

In  contrast,  the  attribute  constraints  of  a  defined  class  are  necessary  and  suffi- 
cient conditions  for  class  membership.  This  means  defined  classes  represent  concepts 
that  can  be  fully  specified  and  therefore,  class  membership  in  defined  classes  can  be 
automatically  decided. 

Similarly,  an  instance,  which  can  also  have  more  than  one  parent  class,  must  have 
attributes  and  values  that  satisfy  the  attribute  constraints  of  its  parents.  An  instance 
can  have  more  attributes  than  defined  in  its  parent  classes,  and  the  values  for  these 
additional  attributes  are  constrained  only  by  the  domains  specified  in  the  attribute 
hierarchy.  Thus,  the  inunediate  or  most  specific  parent  classes  of  the  instance  can  be 
automatically  deduced. 
2.2.2   Constraint  Specification 

A  class  description  comprises  its  superclasses,  subclasses,  instances  and  attributes. 
In  addition,  one  can  specify  constraints  on  these  attributes.  There  are  four  kinds 
of  constraints:  "max,"  "some,"  "exactly,"  and  "all."  The  "max"  constraint  means 
that  an  attribute  can  have  at  most  a  specified  number  of  value  fillers.  The  "some" 
constraint  means  that  there  exist  at  least  a  specified  number  of  value  fillers,  each  value 
belonging  to  a  certain  domain  qualified  by  value  constraints,  (  vc  )  (see  Figure  2.1 
for  BNF).  The  "exactly"  constraint  says  that  exactly  a  specified  number  of  attribute 
fillers  must  satisfy  a  value  constraint;  it  is  the  combination  of  "some"  and  "max." 
The  "all"  constraint  specifies  that  all  values  of  an  attribute  must  belong  to  a  domain 
qualified  by  (  vc  ).  Note  the  similarity  of  the  "all"  and  "some"  constraints  to  the 
universal  and  existential  quantifiers  of  first  order  predicate  calculus  [65]. 

Value  constraints  on  attributes  specify  domains  (DOMAIN  (  type-c  ))  or  actual 
values  (VALUE  (  type-i  )  ).  Domains  may  be  specified  by  naming  the  class  or  type 
as  in  CLASS,  STRING,  INTEGER,  or  REAL,  or  by  using  the  type  constructors 
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RANGE,  SET,  SETDIF,  and  COMPOSITE.  RANGE  specifies  a  range  of  values  be- 
tween some  upper  and  lower  bounds  which  may  be  inclusive,  exclusive,  or  NIL  (+/- 
Infinity).  RANGE  is  currently  typed  over  reals  and  integers.  The  SET  construct 
allows  a  set  of  domains  to  be  specified  such  that  the  attribute  values  must  belong  to 
the  union  of  these  domains.  A  SET  may  recursively  include  any  construct  defined  in 
(  type-c  ).  SETDIF  allows  a  special  form  of  negation  (set  difference)  to  be  handled  in 
a  safe  way.  For  instance,  "SETDIF  /  5-,"  where  /  is  a  class  and  g  is  its  subclass  (or  an 
instance  which  belongs  to  /),  means  a  set  consisting  of  only  those  instances  belonging 
to  /  and  not  to  g.  The  COMPOSITE  domain  is  the  aggregation  of  other  (possibly 
complex)  domains,  in  which  each  component  domain  is  labeled  by  an  attribute  name 
along  with  its  constraints.  For  instance,  a  COMPOSITE  domain  for  an  attribute 
called  Date  would  have  component  attributes  of  Month,  Day,  and  Year.  The  type 
constructors  RANGE,  SET,  SETDIF,  and  COMPOSITE  make  it  possible  to  describe 
complex  domains  without  having  to  create  additional  classes  and  instances  as  would 
have  been  required  in  the  original  models. 

A  sample  database  with  its  class  and  attribute  hierarchies  is  shown  in  Figures  2.2 
and  2.3.  Next  it  will  be  described  how  classification  is  exploited  to  maintain  the  class 
hierarchy,  enforce  constraints,  and  process  queries. 

2.3    Classification  and  Subsumption 

A  significant  departure  from  traditional  database  querying  techniques  is  that  a 
query  is  treated  just  as  any  other  object  described  in  the  database  schema.  An  object 
definition  is  prescriptive  [33]  in  the  sense  that: 

1.  It  provides  necessary  and  sufficient  conditions  for  class  membership  that  can 
be  used  to  deduce  additional  relationships  among  objects  not  specified  by  the 
user. 
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(  class  )  ::=  {  classname  )  CLASS  (  primitive-flag  ) 
[SUPERCLASSES  (  superclass  )  +] 
[SUBCLASSES  (  subclass  )  +] 
[INSTANCE-LIST  {  inst  )  -|-] 
[ATTR-CONSTRAINTS  {  attr-constraint  )  -j-] 

(  instance  )  ::=  (  inst-name  )  INSTANCE  [  (  iparent  )  +][ATTR  (  attr-value  )  -t-] 

(  attr  )  ::-  {  attr-name  )  ATTR  [  (  attr-parent  )  ]  (  vc  ) 

(  disjoint-class  )  ::=  (  disjoint-name  )  DISJOINT  (  classname  )  (  disj  )  + 

{  primitive-flag  )  ::=  PRIMITIVE  |  DEFINED 

(  attr-constraint  )  ::=  (  attr-name  }  (  constraint  )  -|- 

(  attr-value  )  ::=  (  attr-name  )  (  type-i  )  + 

{  constraint  )  "=  (  max  )  |  (  some  )  |  (  exactly  )  |  {  all  ) 

(  max  )  ::=  ATMOST  (  integer  ) 

(  some  )  ::=  ATLEAST  (  integer  )  (  vc  ) 

(  exactly  )  ::=  EXACTLY  (  integer  )  (  vc  ) 

{  all  )  ::=  ALL  (  vc  ) 

(  vc  )  ::=  (DOMAIN  (  type-c  )  )  |  (VALUE  (  type-i  )  )  |  NIL 

(  type-c  )  ::=  (CLASS  (  classname  )  )  |  STRING  |  INTEGER  | 

I  REAL  I  (RANGE  (  range  )  )  |  (SET  {  type-c  )  -1-)  | 
(SETDIF  (  classname  )  ','  (  classname  )  )  | 
(COMPOSITE  (  attr-constraint  )  +) 

{  type-i  )  ::=  (CLASS  (  classname  )  )  |  (INSTANCE  (  inst-name  )  )  | 

STRING  (  string  )  )  |  (INTEGER  (  integer  )  )  |  REAL  (  real  )  )  | 

(RANGE  (  range  )  )  |  (SET  (  type-i  )  +)  | 

(SETDIF  (  classname  )  ','  (  classname  )  )  |  (COMPOSITE  {  attr-value  )  -t-) 

(  range  )::=((  '('  !'[')(  num  >  |  NIL  )  ','  (  NIL  |  (  num  )  (  ')'  |  ']'  )  ) 
(  num  )  ::=  (  real  )  |  (  integer  ) 

(  superclass  )  ::=  (  classname  )  (  classname  )  ::=  (  string  ) 

(  subclass  )  ::=  {  classname  )  (  inst-name  )  ::=  (  string  ) 

(  inst  )  ::=  (  inst-name  )  (  attr-name  )  ::=  (  string  ) 

(  iparent  )  ::=  (  classname  )  (  disjoint-name  )  ::=  (  string  ) 
(  attr-parent  )  ::=  (  attr-name  ) 
(  disj  )  ::=  (  classname  ) 

Figure  2.1:  BNF  Grammar  for  CANDIDE 
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Figure  2.2:  The  University  Database,  a)  Class  Hierarchy,  b)  Attribute  Hierarchy 
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Person 

CLASS  DEFINED 

SUPERCLASS  Thing 

ATTRIBUTE  CONSTRAINTS 

Name:  EXACTLY  1  DOMAIN  STRING 
Ssno:  EXACTLY  1  DOMAIN  STRING 

Employee 

CLASS  DEFINED 

SUPERCLASS  Person 

ATTRIBUTE  CONSTRAINTS 

Occupation:  ALL  DOMAIN  CLASS  Occupation 
Department:  ALL  DOMAIN  CLASS  Department 

Student 

CLASS  DEFINED 

SUPERCLASS  Person 

ATTRIBUTE  CONSTRAINTS 

Courses:  ALL  DOMAIN  CLASS  Course 
Advisor:  ALL  DOMAIN  CLASS  Advisor 
Major:  ALL  DOMAIN  CLASS  College 
GPA:  EXACTLY  1  DOMAIN  RANGE  [0.0,4.0] 

Advisor 

CLASS  DEFINED 
SUPERCLASS  Employee 
ATTRIBUTE  CONSTRAINTS 

Students:  ALL  DOMAIN  CLASS  Student 

Course 

CLASS  DEFINED 

SUPERCLASS  Thing 

ATTRIBUTE  CONSTRAINTS 

Instructor:  ALL  DOMAIN  CLASS  Teacher 
Department:  ALL  DOMAIN  CLASS  Department 
Students:  ALL  DOMAIN  CLASS  Student 

Instruct 

CLASS  DEFINED 

SUPERCLASS  Teach 

ATTRIBUTE  CONSTRAINTS 

Teacher:  ALL  DOMAIN  CLASS  Teacher 
Course:  ALL  DOMAIN  CLASS  Course 


Figure  2.3:  Some  Definitions  in  the  CANDIDE  Class  Lattice 
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2.  It  provides  the  minimal  class  description  for  any  object  to  be  considered  a  mem- 
ber of  the  corresponding  type  set,  i.e.,  an  object  instance  can  have  additional 
attributes  beyond  what  is  prescribed  in  each  parent  class,  thus  relaxing  the  fixed 
arity  constraint. 

3.  Typing  information  is  attached  to  the  object  class  in  terms  of  its  superclasses, 
subclasses,  and  domain  restrictions  on  its  attribute  labels  (this  is  similarly  true 
for  object  instances). 

4.  Since  the  attributes  are  labeled,  traditional  constraint  such  as  fixed  position 
(ordering  of  attributes)  can  be  relaxed. 

This  interpretation  must  be  based  on  the  semantics  of  the  subsumption  relation- 
ship. A  class  F  subsumes  a  class  G  if  and  only  if  every  instance  of  G  is  also  an 
instance  of  F,  i.e.,  F  is  a  superclass  of  G.  This  subsumption  relationship  is  computed 
on  the  basis  of  whether  the  attribute  constraints  for  class  F  logically  imply  the  at- 
tribute constraints  for  class  G.  The  classification  operation  can  compute  the  missing 
relationships  by  controlled  application  of  the  subsumption  function,  and  completely 
specify  the  class  hierarchy.  Details  of  the  inferencing  rules  used  in  the  classifier  are 
presented  in  Patel- Schneider  [86]. 

This  same  classification  process  can  be  used  to  compute  the  results  of  a  query. 
Classification  can  be  viewed  as  the  process  of  correctly  locating  a  new  object 
in  an  existing  hierarchy.  The  correct  location  is  immediately  below  the  most  spe- 
cific classes  which  subsume  the  new  class  and  immediately  above  the  most  general 
classes  subsumed  by  this  new  class.  Classification  involves  a  combination  of  depth- 
first /breadth-first  search  of  the  class  hierarchy,  beginning  at  known  superclasses  of 
the  object  to  be  classified,  and  applying  the  subsumption  function,  continuing  as  long 
as  it  succeeds.   Thus,  a  query  object  specification  is  classified  against  the  complete 
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hierarchy.  The  required  object  instances  are  obtained  from  the  union  of  the  set  of 
instances  of  all  the  superclasses  of  the  query  object,  subject  to  additional  attribute 
constraints  of  the  query  object. 

2.4   Classification  as  a  Query  Processing  Technique 

Much  has  been  said  of  the  semantic  inadequacies  of  the  three  classical  data  models, 
namely,  the  hierarchical,  network,  and  relational  models.  These  inadequacies  have 
spawned  a  large  number  of  proposed  semantic  or  object-oriented  data  models  [46]. 
At  first  these  models  were  used  by  designers  for  describing  a  database  at  a  conceptual 
level.  This  conceptual  schema  was  then  translated  into  a  hierarchical,  network,  or 
relational  schema.  Now  semantic  data  models  are  increasingly  being  used  as  database 
systems  directly  [108,15,55,115,129,133].  Thus,  interest  in  strategies  for  querying  such 
systems  is  emerging. 

The  query  languages  being  proposed  for  these  semantic  data  models  are  usually 
based  on  some  algebra  and  tend  to  have  a  SQL-  or  QUEL-like  syntax  [69,85,21,129,1]. 
Queries  expressed  in  these  languages  cannot  be  described  by  the  underlying  data 
models,  and  hence  the  DDL  and  DML  are  distinct  features  of  the  system.  This  can  be 
termed  the  operational  approach.  The  DML  specifies  manipulations  to  be  performed 
on  the  database,  which  are  then  internally  translated  into  a  sequence  of  algebraic 
operations.  It  does  not  exploit  the  structural  relationships  or  domain  constraints  that 
are  specified  in  the  schema  of  the  semantic  data  model.  The  associated  problems  of 
this  approach  are  as  follows: 

1.  The  query  language  syntax  has  no  bearing  on  the  underlying  data  model. 

2.  The  structural  and  semantic  constraints  of  the  model  are  not  exploited  to  aid 
in  query  interpretation,  query  reuse,  or  query  reformulation. 
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3.  Being  a  fragmented  approach,  it  is  difficult  to  treat  data  objects,  queries,  and 
views  homogeneously  through  a  single  language. 

4.  View  definitions  in  these  languages  lead  to  serious  update  problems. 

5.  Even  though  the  query  language  may  be  declarative,  the  user  still  has  to  think  of 
how  (in  terms  of  algebraic  operations)  and  from  where  (logical  access  paths,  plus 
exact  names  of  attributes  and  objects)  to  get  the  desired  data  before  specifying 
the  query. 

The  DDL  and  DML  dichotomy  in  these  database  systems,  coupled  with  a  lack  of 
reflection  of  the  model  semantics  in  the  query  languages  or  DMLs,  is  at  the  core  of 
the  above  problems.  This  has  been  somewhat  alleviated  in  the  relational  model  by 
marrying  it  with  logic  as  exemplified  by  the  so-called  deductive  databases  [36].  But 
there  is  no  such  marriage  known  between  a  semantic  data  model  and  logic.  A  more 
homogeneous  specification  and  behavior  of  objects,  queries,  and  views  is  desirable. 
This  necessarily  means  collapsing  the  DDL  and  DML  into  a  single  coherent  language. 
Further,  this  language  must  be  formally  specified  so  as  to  reflect  the  semantics  of 
the  data  model.  Ideally,  this  data  model  should  be  a  correct,  complete  and  tractable 
model  of  computation,  and  yet  be  expressive  enough  to  be  useful. 

With  this  objective  in  mind,  it  is  proposed  that  one  way  to  enforce  database 
integrity  and  also  process  queries  is  based  on  deductive  reasoning  about  object  defi- 
nitions. Classification  and  subsumption  functions  are  used  for  such  reasoning  about 
structural  relationships  among  objects.  Subsumption  relationships  determine  if  one 
object  is  a  special  case  of  another.  Classification  is  a  search  technique  that  correctly 
places  new  objects  into  an  existing  hierarchy  by  repeatedly  applying  a  subsumption 
function.  These  have  also  been  described  as  terminological  logics  since  structural 
relationships  represent  object  definitions  and  the  terms  for  describing  data  [83]. 
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The  notions  of  classification  and  subsumption  have  been  formally  developed  in 
a  series  of  frame-based  knowledge  representation  languages  which  explore  the  com- 
putational complexity  of  subsumption.  These  include  FL  and  FL-  [65],  BACK  [83], 
KL-ONE  [9],  NIKL  [50],  and  KANDOR  [86,87].  The  disappointing  conclusion  of 
these  studies  is  that  subsumption  becomes  intractable  unless  the  constructs  of  the 
language  are  carefully  and  narrowly  constrained.  Semantic  data  models  have  not 
been  concerned  with  these  issues  because  they  either  rely  on  operational  query  lan- 
guages or  serve  as  conceptual  modeling  tools  which  do  not  support  queries  at  all. 
Few  are  capable  of  supporting  terminological  reasoning,  and  there  are  no  known  data 
models  that  address  the  tractability  requirement. 

Of  these  frame-based  languages,  FL-,  KANDOR,  and  BACK  explore  the  limit  of 
expressibihty  and  tractability,  but  out  of  these  restricted  languages  only  FL-  has  a 
subsumption  function  that  can  be  executed  in  polynomial  time  [83].  CANDIDE  is 
based  on  extensions  of  these  languages  in  order  to  facilitate  their  applicability  to  data 
modehng.  CANDIDE  illustrates  that  querying  by  classification  is  a  viable  technique 
for  querying  semantic  data  models.  In  particular,  the  following  points  are  emphasized: 

1.  The  DDL  and  DML  become  a  single  language.  Thus,  the  semantics  of  the 
database  schema  is  automatically  exploited  for  query  processing. 

2.  Since  query  and  view  objects  are  treated  as  new  class  definitions,  they  are 
represented  as  object  definitions  and  behave  in  the  same  way  as  data  objects. 

3.  The  approach  lends  itself  to  building  an  integrated  natural  language  processor 
that  derives  information  from  the  database  in  order  to  resolve  a  natural  language 
query. 

4.  It  is  possible  to  couple  this  approach  as  a  semantic  layer  on  top  of  existing 
databases  under  various  scenarios. 
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The  use  of  subsumption  and  classification  for  querying  databases  is  a  form  of 
concept  matching.  Concept  matching  has  its  roots  in  the  frame  matching  retrieval 
methods  of  systems  such  as  KRL  [6],  pattern  matching  in  logic  databases  [92,133], 
and  graph  matching  [16,112].  Various  approaches  to  classification  are  described  in 
[9,86,87,24,83].  One  approach  is  being  applied  here  for  formally  computing  database 
queries  and  it  can  be  used  as  a  viable  way  for  querying  semantic  data  models. 

Querying  by  classification  is  the  process  of  specifying  a  query  object  in  the  same 
notation  as  data  objects,  and  then  searching  for  objects  that  are  structurally  related 
to  this  query  object.  Query  processing  is  based  on  deductive  inferencing  about  object 
structures  rather  than  a  procedural  specification  of  operations.  Query  specification 
is  entirely  declarative  in  that  the  user  need  not  provide  any  information  on  how  the 
query  is  to  be  executed.  The  user  concentrates  on  describing  the  desired  information. 
Inferencing  techniques  for  matching  are  defined  formally  by  the  subsumption  function 
which  determines  whether  one  object  class  is  a  subclass  of  another.  Since  subsumption 
is  a  form  of  terminological  reasoning  [83],  the  user  can  describe  a  query  in  terms  that 
may  be  different  from  the  exact  terms  under  which  the  desired  information  is  stored, 
so  long  as  the  meaning  is  similar.  This  contrasts  with  the  SQL-type  queries  in  which 
names  of  relations  and  attributes  must  be  precisely  specified.  Dealing  with  concepts 
also  implies  that  the  database  objects  are  not  only  descriptors  for  physical  data, 
but  must  also  represent  the  meaning  of  terms  used  to  describe  the  data.  Although 
semantic  data  models  are  capable  of  expressing  these  meanings,  this  aspect  has  not 
been  exploited  for  query  processing. 

Querying  using  classification  can  be  contrasted  with  the  operational  approach 
exemplified  by  SQL-type  query  languages.  In  the  operational  approach,  a  query  is 
mapped  into  a  sequence  of  algebraic  operations  applied  to  the  objects  in  the  database. 
Internally,  this  is  a  procedural  approach  since  the  plan  of  query  execution  involves  an 
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exact  sequence  of  operations.  Externally,  high-level  data  manipulation  languages  are 
used  that  are  more  declarative  in  that  the  user  need  not  specify  these  steps.  However, 
the  user  must  still  have  a  thorough  understanding  of  the  database  schema  and  must 
know  the  names  of  objects  and  attributes.  Also,  the  user  must  be  able  to  conceptualize 
the  operations  in  general.  For  example,  the  user  must  specify  access  paths  or  describe 
which  relations  need  to  be  joined.  Several  data  manipulation  languages  based  on 
the  operational  approach  have  been  proposed  for  semantic  data  models  including 
GORDAS  [21],  GEM  [129],  ARIEL  [69],  an  entity-relationship  algebra  [85],  and  OQL 

Other  non-operational  approaches  to  querying  semantic  data  models  have  been 
proposed.  Zhu  and  Maier  [133]  describe  abstract  objects  for  TEDM.  These  abstract 
objects  take  a  form  that  is  very  similar  to  other  TEDM  database  objects,  and  they 
are  a  part  of  the  database,  acting  as  views.  They  can  be  matched  with  other  database 
objects  through  isomorphic  relations  defined  on  their  attributes  and  values.  In  CAN- 
DIDE,  there  is  no  distinction  between  permanent  query  objects  and  database  objects. 
Also,  the  subsumption  function  plays  a  stronger  role,  for  example,  by  enforcing  con- 
straints on  insert  operations.  It  is  able  to  identify  complex  relationships  in  addition 
to  the  type-preserving  isomorphic  mappings  (for  example,  it  can  determine  that  a 
male  child  is  a  son). 

ARGON  [88]  is  an  information  retrieval  system  built  on  top  of  KANDOR,  but 
it  still  had  to  process  some  queries  outside  the  classifier.  It  was  the  same  case  with 
RABBIT  [118].  But  due  to  the  extensions  in  CANDIDE  as  described  above,  it  is 
possible  to  handle  certain  queries  that  otherwise  had  to  be  processed  by  operations 
outside  the  classifier. 

Finally,  the  concept  matching  approach  can  be  contrasted  with  querying  in  logic 
databases.   The  search  by  the  classifier  is  directed  by  the  taxonomical  relationships 
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among  objects,  and  the  terms  being  unified  can  be  complex  objects,  but  the  subsump- 
tion  function  is  much  less  powerful  than  full  first-order  logic.  Essentially,  subsumption 
can  be  viewed  as  a  constrained  inferencing  technique  compared  with  logical  queries. 
The  details  of  query  specification  and  processing  based  on  classification  will  now  be 
presented. 
2.4.1    Query  Specification  and  Processing 

The  following  examples  illustrate  the  specification  of  query  objects  and  the  steps 
involved  in  processing  queries  using  the  university  database  with  the  class  and  at- 
tribute hierarchy  shown  in  Figure  2.2.  The  structure  of  some  of  the  data  objects 
is  shown  in  Figure  2.3.  Query  objects  are  created  using  the  same  BNF  notation  as 
data  objects  (Figure  2.1).  Complex  queries  may  be  decomposed  into  several  nested 
query  objects. 

Once  a  query  object  has  been  specified,  classification  is  used  for  query  processing. 
In  classification,  all  the  superclasses,  subclasses,  and  instances  of  the  query  object  are 
identified.  The  result  of  the  query  is  the  instances  and  subclasses  of  the  query  object. 
The  query  object  may,  if  so  desired,  be  inserted  permanently  into  the  database. 
Example  1:  What  courses  offered  by  the  college  of  engineering  are  taught  by  Smith? 

Query 

SUPERCLASS  Course 
ATTRIBUTE  CONSTRAINTS 
Instructor:  EXACTLY  1  VALUE 

INSTANCE  Smith 
Department:  EXACTLY  1  DOMAIN 
CLASS  Engineering 

This  Query  object  is  a  special  class  of  Course  in  which  the  Instructor  is  Smith  and  the 
Department  is  an  Engineering  department  such  as  Electrical  or  Mechanical.  Smith  (an 
instance)  and  Engineering  (a  class)  are  objects  that  are  already  part  of  the  database. 
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To  process  this  query,  the  classifier  would  begin  at  class  Course,  and  try  to  find 
more  specialized  classes  that  subsume  the  Query  class.  Since  there  are  no  classes  more 
specialized  than  Course,  there  is  no  additional  searching  in  this  example.  The  next 
step  is  to  test  each  instance  of  Course  to  see  which  can  meet  the  attribute  constraints 
of  the  Query  object,  namely  those  in  which  the  value  for  the  Instructor  attribute  is 
exactly  "Smith,"  and  for  which  the  Department  is  some  instance  of  Engineering. 
Example  2:  Find  all  courses  which  have  at  least  10  Engineering  students  enrolled. 

Queryl 

SUPERCLASS  Course 
ATTRIBUTE  CONSTRAINTS 
a  Students:  ATLEAST  10  DOMAIN  CLASS  Query2 

Query  2 

SUPERCLASS  Student 
ATTRIBUTE  CONSTRAINTS 
b  Major:  EXACTLY  1  DOMAIN  CLASS  Engineering 

This  is  a  nested  query.    Queryl  is  a  specialization  of  Course  in  which  at  least  10 

students  are  instances  of  class  Query2.  The  value  of  10  for  the  ATLEAST  constraint 

(line  a)  expresses  cardinality.  Query2  describes  the  class  of  all  students  majoring  in 

an  Engineering  department  (line  b).  Query2  would  first  be  classified  to  temporarily 

create  a  new  class  having  as  its  instances  those  Students  majoring  in  Engineering. 

Queryl  is  then  classified  to  find  courses  with  at  least  ten  Students  that  are  instances 

of  Query2. 

Example  3:  Find  persons  who  instruct  courses  taken  by  persons  majoring  in  Com- 
puter Science. 

Queryl 

CLASS  DEFINED 
SUPERCLASS  Person 
ATTRIBUTE  CONSTRAINTS 

Instructs  ATLEAST  1  DOMAIN  CLASS  Query2 
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Query2 

CLASS  DEFINED 
SUPERCLASS  Course 
ATTRIBUTE  CONSTRAINTS 
a  Students  ATLEAST  1  DOMAIN  CLASS  Query3 

Query  3 

CLASS  DEFINED 
SUPERCLASS  Person 
ATTRIBUTE  CONSTRAINTS 
b  Major  ATLEAST  1  VALUE 

INSTANCE  Computer  Science 

This  example  illustrates  aspects  of  terminological  reasoning  in  that  the  Teacher  and 
Student  classes  are  not  mentioned  directly  in  the  query.  They  are  located  automat- 
ically by  the  classifier.  The  objects  are  classified  in  a  nested  fashion  that  is  similar 
to  nesting  select-from-where  blocks  in  SQL.  Notice  that  the  nesting  also  provides  a 
form  of  path  specification  that  is  similar  to  a  natural  join  in  relational  algebra. 

QueryS  represents  the  concept  "persons  majoring  in  Computer  Science."  QueryS 
is  classified  first,  beginning  the  search  at  "Person"  which  is  listed  in  the  SUPER- 
CLASS list.  Since  Employee  does  not  have  a  Major  attribute,  the  subsumption  func- 
tion fails  on  Employee  (see  Figures  2.2  and  2.3).  It  succeeds  for  Student,  but  cannot 
satisfy  any  of  the  attribute  constraints  for  Graduate  or  Undergraduate  (since  there  is 
now  information  provided  that  is  specific  to  these  subclasses).  So,  the  classification 
step  ends  for  Query3  with  its  being  subsumed  by  Student.  Next,  all  instances  of 
Student  are  tested  to  see  which  can  meet  the  constraint  in  line  b  of  Query3,  resulting 
in  the  set  of  instances  that  contains  all  students  majoring  in  Computer  Science. 

Query2  is  the  set  of  all  courses  containing  at  least  one  student  who  is  an  instance 
of  Query3  (line  a).  Since  there  are  no  classes  below  Course,  the  classifier  cannot  find 
any  more  specific  classes.  All  instances  of  Course  are  tested  against  Query2,  resulting 
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in  the  desired  set  of  instances. 

Finally,  Queryl  is  classified  beginning  at  Person.  Since  Instructs  is  an  Occupation 

according  to  the  attribute  hierarchy  (Figure  2.2),  Employee  subsumes  Queryl,  but 

Queryl  cannot  satisfy  the  Student  attribute  constraints.  Similarly,  subsumption  fails 

for  Administrator  and  Secretary.    It  succeeds  for  Teacher  since  Instructs  is  below 

Teaches  in  the  attribute  hierarchy.  It  cannot  proceed  further  because  the  attribute 

constraints  for  Professor  and  Instructor  are  not  satisfied.    All  instances  of  Teacher 

(which  includes  all  instances  of  Professor  and  Instructor)  are  tested  against  Queryl , 

resulting  in  the  set  of  instances  of  Teacher  that  teach  at  least  one  course  which  is  an 

instance  of  Query2.  This  final  set  satisfies  the  query. 

Example  4:  Find  all  students  who  do  not  have  advisors  from  engineering,  but  who 
have  taken  either  at  least  2  engineering  courses  or  one  engineering  and  one  liberal 
arts  course. 

Queryl 

CLASS  DEFINED 
SUPERCLASSES  Student 
a  SUBCLASSES  Query2,  Query3 

ATTRIBUTE  CONSTRAINTS 

Advisor:  EXACTLY  1  DOMAIN  COMPOSITE 
b  Department:  ALL  DOMAIN 

SETDIF  {CLASS  College, 
CLASS  Engineering} 

Query2 

CLASS  DEFINED 
SUPERCLASSES  Queryl 
ATTRIBUTE  CONSTRAINTS 

Courses:  ATLEAST  2  DOMAIN  CLASS  Engineering 

QueryS 

CLASS  DEFINED 
SUPERCLASSES  Queryl 
ATTRIBUTE  CONSTRAINTS 

Courses:  ATLEAST  1  DOMAIN  CLASS  Engineering 
ATLEAST  1  DOMAIN  CLASS  Liberal  Arts 
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The  "or"  requirement  is  handled  by  creating  two  subclasses  (line  a)  in  Queryl .  Alter- 
nately, this  could  also  have  been  modeled  by  the  SET  construct.  This  is  the  equivalent 
of  a  relational  union.  The  negation  for  advisor  is  treated  as  a  set  difference  (line  b), 
resulting  in  the  set  of  values  for  advisor's  department  which  includes  every  College 
except  Engineering.  That  is,  negation  is  taken  as  the  set  of  instances  which  does 
not  include  the  instances  of  the  negated  class.  Query2  and  QueryS  are  classified  first, 
creating  temporary  classes  that  are  used  in  the  classification  of  Queryl.  The  SETDIF 
construct  is  evaluated  by  checking  the  value  of  the  Advisor  attribute  to  see  that  its 
Department  is  an  instance  of  College  but  not  an  instance  of  Engineering. 

Example  5:  Find  all  students  who  teach  other  students  with  a  GPA  greater  than 
3.5. 

Query 

CLASS  DEFINED 
SUPERCLASS:  Student 
ATTRIBUTE  CONSTRAINTS 
a  Teaches:  ATLEAST  1  DOMAIN  COMPOSITE 

b  Students:  ATLEAST  1  DOMAIN 

COMPOSITE 

GPA:  EXACTLY  1 

DOMAIN  RANGE  (3.5,4.0] 

Here  the  Query  object  is  of  class  Student.  The  attribute  constraints  are  also  con- 
straints on  the  Student  domain,  resulting  in  a  recursive  query.  The  nested  con- 
straints on  Students  (line  b)  requires  at  least  one  student  with  GPA  greater  than  3.5. 
The  constraints  on  Teaches  (line  a)  requires  at  least  one  such  course  to  be  taught 
that  contains  one  such  student.  Although  Student  is  not  defined  to  have  a  Teaches 
attribute  in  the  database  definition  (Figure  2.3),  TA  which  is  a  subclass  of  Student 
has  such  an  attribute.  In  query  processing,  this  Query  class  would  be  subsumed  by 
TA.  Thus,  only  instances  that  are  both  Students  and  Teachers  will  be  tested  for  the 
additional  constraints. 
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Example  6:  This  example  illustrates  view  creation.  Suppose  we  wish  to  define  a 
new  concept  called  "Honors  Student,"  which  is  a  student  with  GPA  greater  than  3.5. 


Honors  Student 

CLASS  DEFINED 
SUPERCLASS  Student 
ATTRIBUTE  CONSTRAINTS 

GPA:  EXACTLY  1  DOMAIN  RANGE  (3.5,4.0] 

This  query  object  will  retrieve  all  honors  students  by  checking  all  instances  of  Student 
and  retrieving  those  meeting  the  constraints.  This  object  could  be  made  a  permanent 
part  of  the  database  resulting  in  view  creation.  Any  Student  instance  with  GPA 
greater  than  3.5  would  then  automatically  become  an  instance  of  this  class  during 
insert  or  modify  operations.  It  could  then  be  used  to  retrieve  honors  students  at 
any  time  without  having  to  reprocess  the  query.  Thus,  this  view  object  could  have 
been  used  to  compute  the  result  of  Example  5  by  rephrasing  the  query  as  "Find  all 
students  who  teach  honors  students,"  which  is  shown  below: 

Query 

CLASS  DEFINED 
SUPERCLASS:  Student 
ATTRIBUTE  CONSTRAINTS 

Teaches:  ATLEAST  1  DOMAIN  COMPOSITE 
Students:  ATLEAST  1  DOMAIN 

CLASS  Honors  Student 

2.4.2  Summary  of  Query  Processing  Features  With  respect  to  the  above  examples, 
the  following  features  can  be  emphasized: 

1.  The  declarative  nature  of  query  specification  is  evident  in  all  examples. 

2.  These  queries  can  be  compared  with  relational  algebra  in  the  sense  that: 

•  A  "join"  is  illustrated  in  examples  2,  3,  and  4. 
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•  "Selection"  is  illustrated  in  examples  1  and  5. 

•  'Student. GPA'  is  "projected"  in  example  5  (line  b). 

•  Set  union  and  set  difference  are  shown  in  example  4. 

•  Examples  2,  3  and  4  illustrate  nested  queries. 

However,  this  approach  cannot  support  arbitrary  joins  since  the  classes  must 
be  related  through  some  navigation  path.  Thus,  not  all  queries  processed  in 
relational  algebra  can  be  supported.  Moreover,  join  with  itself  is  not  supported 
since  CANDIDE  has  no  variables  or  aliasing  capabilities. 

3.  View  definition  (example  6),  query  reuse  (examples  5  and  6  together),  and 
a  type  of  recursive  query  (example  5)  are  supported. 

4.  Example  4  also  illustrates  query  conibination,  which  is  simply  the  general- 
ization of  subqueries  into  a  "superquery." 

5.  A  limited  form  of  negation  is  supported  based  on  the  SETDIF  construct  as 
shown  in  example  4.  SETDIF  is  a  set  difference  between  the  extensions  of  two 
classes.  This  can  be  extended  to  handle  arbitrary  sets  of  values.  For  example, 
"Age  not  between  21  and  25"  could  be  expressed  as  "Age:  ALL  DOMAIN 
SETDIF(INTEGER,RANGE[21,25])." 

2.5   Relationship  to  Relational  Databases 

This  section  shows  some  possible  mappings  between  a  relational  database  (RDB) 
and  its  equivalent  representation  in  CANDIDE. 

1.  Map  an  extract  of  an  RDB  onto  a  CANDIDE  database  (CDB).  Provide  all 
query  processing  through  the  CDB.  To  make  it  viable,  the  classification  process 
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should  be  optimized.  This  is  a  workable  solution  for  retrieval  intensive  databases 
where  updates  can  be  applied  periodically  and  a  new  extract  created  after  each 
update. 

2.  Do  a  total  mapping  of  an  RDB  into  a  CDB  and  support  queries  simultaneously 
on  both  databases.  This  again  requires  that  classification  be  optimized.  Fur- 
thermore, if  on-line  updating  is  supported,  then  each  update  must  be  correctly 
reflected  in  both  the  databases. 

3.  A  third  option,  which  coincides  with  the  current  trend  in  optimizing  logic  queries 
[31],  is  to  map  the  queries  in  a  CDB  after  going  through  the  classifier  into 
appropriate  minimal  SQL  queries.  The  execution  of  such  queries  is  then  left  to 
the  optimizer  within  the  RDB. 

The  following  steps  are  involved  in  mapping  a  relational  schema  into  the  CAN- 
DIDE  data  model: 

1.  Each  relation  becomes  a  class,  and  the  attributes  of  the  relation  each  become 
an  attribute  constraint  with  the  EXACTLY  1  constraint  to  the  domain  of  the 
attribute.  Relations  with  multiple  attributes  as  keys  may  be  transformed  into 
CANDIDE  classes  with  multi- valued  attributes. 

2.  Tuples  are  mapped  into  instances  of  a  class.  More  than  one  tuple  may  be  com- 
bined into  a  single  instance.  For  example,  consider  a  relation  DEPARTMENT 
and  another  called  DEPT-LOCATION;  assume  that  a  department  may  have 
multiple  locations: 


DEPARTMENT(Dept-No,Dname,Dbudget) 
DEPT-LOCATIQN(Dept-No,Dloc) 
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Here  the  department  locations  are  stored  under  attribute  Dloc.  These  two 
relations  may  be  mapped  to  a  single  CANDIDE  class  object  Department  with 
attribute  Dloc  which  may  have: 


Department 

CLASS  DEFINED 

ATTRIBUTE  CONSTRAINTS 

Dept_No:  EXACTLY  1  DOMAIN  INTEGER 
Dept_Name:  EXACTLY  1  DOMAIN  STRING 
Dbudget:  EXACTLY  1  DOMAIN  REAL 
Dloc:  ALL  DOMAIN  STRING 


Since  CANDIDE  instances  are  in  non-first  normal  form,  it  is  possible  to  combine 
all  this  information  into  one  data  structure. 

3.  One  additional  attribute  constraint  is  added  to  the  class  for  a  relation  per  every 
association  to  another  relation  through  a  foreign  key. 

4.  The  key  of  a  relation  is  associated  with  the  class  name. 

5.  A  class  for  a  relation  can  have  subclasses  to  express  generalization  and  special- 
ization. Such  a  relationship  is  suggested  by  common  keys.  For  example,  both 
PERSON  and  STUDENT  relations  may  have  SSN  (social  security  number)  as 
a  key. 

6.  The  values  within  an  attribute  domain  become  instances  of  a  class  representing 
that  attribute  (except  for  primitive  values  such  as  integer  or  string).  This 
enables  complex  domains  to  be  built  for  the  relational  attributes. 

With  the  above  mapping,  a  semantic  layer  is  created  on  top  of  an  underlying 
relational  database  to  simphfy  the  end-user  interface.  Since  a  CANDIDE  schema 
has  more  semantic  information  than  the  corresponding  relational  schema  (e.g.    in 
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term  of  attribute  constraints,  generalization,  and  so  forth),  this  information  must  be 
externally  supphed  over  and  above  the  relational  schema  definition.  Note  also  that 
the  inverse  process  of  mapping  the  data  model  into  the  relational  model  cannot  be 
done  without  loss  of  information. 

2.6   Natural  Language  Processing 

Traditionally,  natural  language  query  systems  are  interfaced  to  a  database  man- 
agement system  through  an  intermediate  query  language.  Even  in  the  most  recent 
systems  such  as  DATATALKER  [8]  and  KID  [47],  which  do  support  a  rich  model 
for  domain  semantics,  the  objective  is  to  translate  the  natural  language  query  into 
a  formal  query  language  such  as  SQL.  There  are  a  number  of  disadvantages  to  this 
approach.  The  sentences  accepted  by  the  system  can  be  no  more  expressive  than 
the  formal  language;  thus,  the  interface  is  limited  to  the  scope  of  SQL  expressions. 
Also,  there  is  poor  integration  between  the  database  and  the  language  understand- 
ing subsystems.  This  means  that  much  of  the  semantic  information  in  the  database 
structure  cannot  be  used  to  interpret  sentences.  Finally,  translation  into  the  formal 
query  language  may  be  an  awkward  task  since  most  database  query  languages  are 
not  designed  for  representing  natural  language  expressions. 

In  contrast,  it  is  easier  to  map  the  structure  of  natural  language  sentences  into 
the  objects  of  a  data  model  such  as  the  one  presented  here  than  to  map  it  into  query 
languages  like  SQL.  Since  the  objects  also  act  as  the  query  language,  the  additional 
step  of  converting  into  another  formal  query  language  is  eliminated.  Furthermore, 
the  language  processor  can  use  database  objects  to  interpret  sentences.  Thus,  the 
semantics  of  the  database  play  an  active  and  direct  role  in  language  understanding. 

The  approach  to  natural  language  understanding  presented  here  is  based  on  in- 
tegrating syntax,  semantics,  and  domain  knowledge.    This  integration  allows  many 
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sources  of  information  to  be  applied  to  understanding  queries,  reducing  ambiguity 
and  enhancing  meaning. 

This  approach  uses  very  tight  coupHng  between  the  natural  language  processor 
(NLP)  and  the  database  management  system  (DBMS).  This  contrasts  with  loose 
coupling,  which  is  the  approach  often  taken  in  portable  natural  language  systems. 
Tight  coupling  assures  that  the  semantics  of  the  database  can  play  a  strong  role  in 
language  processing.  Without  tight  coupling,  the  language  processor  has  only  limited 
knowledge  about  the  domain  of  discourse  and  cannot  be  as  robust. 

Furthermore,  this  approach  lends  itself  to  large  scale,  heterogeneous  databases. 
Typically,  natural  language  systems  are  developed  for  "narrow  domains."  Unfor- 
tunately, such  narrow  domains  are  a  myth.  Even  in  databases  constrained  to  one 
subject  area,  the  diversity  of  language  usage  is  still  tremendous.  In  large  databases, 
the  natural  language  processor  can  be  assisted  by  the  overlapping  of  information. 
Language  usage  learned  in  one  domain  can  be  applied  to  another. 

Natural  language  processing  is  not  without  difficulties.  None  of  the  existing  sys- 
tems are  very  robust,  except  for  a  few  that  operate  on  relatively  simple  databases 
[8,116].  The  biggest  bottleneck  is  the  lexical  acquisition  problem.  The  language  pro- 
cessor must  not  only  have  a  large  vocabulary,  it  must  also  acquire  a  great  deal  of 
knowledge  on  all  the  ways  each  word  can  be  used.  The  lexical  acquistion  problem  is 
the  focus  of  attention  in  subsequent  chapters. 

The  NLP  system  described  here  is  being  developed  as  part  of  a  user  interface  to 
an  information  retrieval  system  covering  a  range  of  agricultural  topics  [3].  The  user 
interface  will  provide  multimedia  access  to  information  in  the  form  of  text,  computer 
graphics,  and  digitized  images  [109].  A  prototype  of  the  NLP  and  DBMS  system 
has  been  implemented  in  the  language  C.  The  current  version  has  a  vocabulary  of 
2000  words  and  a  database  with  1000  objects.   The  syntactic  analyzer  is  based  on  a 
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grammar  for  English  that  has  350  rules  with  broad,  domain-independent  coverage. 
The  language  processor  essentially  involves  three  phases: 

1.  Parse  the  input  sentence  into  a  predicate  using  the  Lexical- functional  grammar 
formalism  (Syntactic  Analysis) 

2.  Map  the  sentence  predicate  into  a  Query  Object  (Semantic  Analysis) 

3.  Process  Query  Object  using  the  Classifier 

CANDIDE  plays  a  central  role  in  the  natural  language  processing  system.  CANDIDE 
organizes  and  stores  data,  and  also  provides  a  representation  for  word  meaning  used 
by  the  semantic  analyzer.  The  terminological  reasoning  components  infer  associations 
between  words. 
2.6.1   Syntactic  Analysis  using  LEG 

The  first  phase  of  the  language  processor  analyzes  the  syntactic  structure  of  the 
input  query.  The  prototype  system  uses  the  Lexical-functional  grammar  formalism 
(LEG)  [51].  LEG  is  a  member  of  a  class  of  grammar  formalisms  known  as  unification 
grammars  [107].  These  grammars  are  based  on  augmented  context-free  grammar 
rules.  They  use  an  attribute-value  pair  notation  for  various  grammatical  features. 
This  notation  is  very  compatible  with  the  structure  of  data  objects.  It  is  therefore 
possible  to  use  the  data  model  to  represent  and  store  a  complex  grammar,  thus  leading 
to  a  completely  integrated  system  [95,37,52].  Currently,  the  prototype  maintains 
grammar  rules  as  separate  entities. 

A  typical  grammar  rule  in  LEG  is  of  the  form: 

s  ^  np  (t  Subject  =  ].),  vp  (T  =  i). 

This  says  that  one  form  for  a  sentence  (s)  is  a  noun  phrase  {np)  followed  by  a  verb 
phrase  {vp).  The  terms  after  np  and  vp  are  the  functional  forms  which  augment  the 
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grammar  rule.  Each  term  on  the  right  hand  side  of  a  grammar  rule  corresponds  to 
a  node  in  the  parse  tree.  The  arrows  point  up  and  down  the  parse  tree  at  this  node 
to  indicate  the  relationship  between  constituents  in  the  sentence.  For  example,  one 
arrow  points  to  the  noun  phrase  as  the  Subject  of  the  sentence.  Another  arrow  points 
to  the  verb  phrase  as  the  main  constituent  of  the  sentence. 

Another  major  component  of  the  LFG  parser  is  the  lexicon.  A  typical  lexical 
entry  is: 

supervised,  pastp,predicate  =  Supervised  Subject,  f  Object) 

A  word  can  have  several  different  entries  corresponding  to  different  usages.  The 
lexicon  contains  grammatical  information  associated  with  each  word.  For  example, 
"supervised"  is  a  past  participle  (pastp).  The  predicate  associates  a  functional  form 
with  the  verb  that  shows  the  argument  structure.  In  this  case  "supervised"  must 
have  a  Subject  and  an  Object.  The  symbol  "Supervise"  appearing  in  the  predicate 
is  the  name  of  an  object  in  the  database.  Thus,  the  lexical  entry  is  the  first  step  in 
mapping  words  to  objects. 

In  parsing,  the  context-free  portion  of  the  grarrmiar  is  used  by  a  very  rapid  LR- 
parsing  algorithm  [117]  which  generates  a  forest  of  parse  trees.  Then  the  functional 
forms  are  evaluated  for  each  tree.  Many  ill-formed  parses  are  rejected  in  the  process. 
The  output  of  the  LFG  parsing  process  is  a  predicate  that  relates  the  constituents  of 
the  sentence.  The  predicate  is  then  passed  to  the  semantic  analyzer. 

Example  7:  The  sentence,  "Which  students  are  supervised  by  professors  in  engi- 
neering" has  the  following  parse  tree: 
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SENTENCE 

NOUN  PHRASE 

ADJECTIVE  Which 
NOUN  Students 
VERB  PHRASE-PASSIVE 
AUX-BE  Are 

VERB-PAST-PART  Supervised 
PREPOSITIONAL  PHRASE-BY 
PREPOSITION  By 
NOUN  PHRASE 
NOUN  Professors 
PREPOSITIONAL  PHRASE 
PRESPOSITION  In 
NOUN  Engineering 

The  fact  that  this  sentence  is  in  passive  voice  is  detected  by  grammar  rules  for  the 
passive  verb  phrase.  The  functional  forms  cause  the  first  noun  phrase  to  be  identified 
as  the  Object,  and  the  argument  of  the  BY  prepositional  phrase  becomes  the  Subject. 
The  predicate  form  for  this  sentence  is: 

Supervise(Professor  [Engineering], Student  [Which]) 

This  says  that  the  main  verb  "Supervise"  has  a  subject  "Professor"  and  an  object 
"Student."   The  brackets,  [],  indicate  modifiers.   The  argument  of  the  "In"  preposi- 
tional phrase  modifies  "Professor,"  and  the  adjective  "Which"  modifies  "Student." 
2.6.2   Building  a  Query  Object 

The  output  from  the  syntactic  analysis  phase  is  a  predicate  that  shows  the  rela- 
tionships among  the  constituents  of  the  sentence.  It  is  not  accidental  that  these  re- 
lationships are  the  same  relationships  that  occur  among  the  objects  in  the  database. 
The  structure  of  database  objects  parallels  our  way  of  thinking  and  speaking  about 
entities  in  the  world  that  the  database  models.  It  is  thus  relatively  easy  to  map  the 
predicate  structure  of  the  natural  language  sentence  to  database  objects.  This  is  the 
step  that  is  facilitated  by  the  semantic  content  of  the  database  model. 
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The  predicate  structure  must  be  transformed  into  a  Query  Object.  There  are 
many  possible  structures  for  the  Query  Object,  some  better  than  others.  A  small  set 
of  heuristic  rules  is  used  in  the  transformation  for  creating  a  Query  Object  which  will 
lead  to  near  optimal  query  processing.  The  process  consists  of  first  converting  the 
predicate  into  a  small  semantic  network,  and  then  selecting  a  node  in  this  network  to 
act  as  the  head  concept  of  the  Query  Object.  Concepts  attached  to  the  head  concept 
must  be  mapped  to  attributes  and  values  of  the  Query  Object.  The  process  is  assisted 
by  using  objects  from  the  database  to  impose  a  structure  on  the  Query  Object.  Notice 
that  there  is  no  attempt  made  to  associate  parts  of  speech  with  constructs  of  the  data 
model  as  has  been  tried  in  other  approaches  [13,67]. 
Example  8:  The  predicate  from  Example  7  can  be  converted  into  a  network: 

Supervise 
I  I 

Which  —  Student       Professor  --  Engineering 

The  predicate  forms  a  network  with  "Supervise"  linked  directly  to  its  two  arguments 
and  each  argument  linked  to  its  modifier.  The  network  shows  that  different  nodes  are 
related  in  some  way,  but  it  is  necessary  to  determine  how  they  are  related  by  using  the 
data  model.  First,  the  node  "Which"  is  identified  as  a  special  "Question  Node."  Only 
a  few  words  can  act  as  Question  Nodes  (Which,  What,  Why,  When,...).  "Which"  acts 
as  a  marker  for  the  attached  node  "Student,"  indicating  that  "Student"  is  the  node 
of  interest.  A  heuristic  rule  for  Question  Nodes  is  activated  that  makes  "Student" 
the  SUPERCLASS  concept  of  the  Query  Object.  The  "Student"  class  object  is 
then  retrieved  as  a  template  into  which  other  components  of  the  network  must  be 
mapped.  Spreading  further  in  the  network,  it  is  necessary  to  find  an  association 
between  "Supervise"  and  "Student."  The  attribute  "Advisor"  in  the  "Student"  class 
object  is  associated  with  "Supervise"   by  the  terminological  reasoner  (there  is  an 
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association  in  the  database  between  these  concepts).  Thus,  "Supervise"  becomes  an 

attribute  restriction.  "Professor"  matches  the  domain  of  "Advisor."  The  "Professor" 
class  object  is  retrieved,  and  "Engineering"  is  associated  with  "Professor"  via  the 
"Department"  attribute  of  that  object.  This  process  results  in  the  following  Query 
Object: 


Queryl 

DEFINED 

SUPERCLASSES:  Student 
ATTRIBUTE  RESTRICTIONS 
Supervisor:  ATLEAST  1  Query2 

Query2 

DEFINED 

SUPERCLASSES:  Professor 
ATTRIBUTE  RESTRICTIONS 

Department:  ATLEAST  1  Engineering 


This  Query  Object  can  now  be  Classified  to  process  the  query. 

2.7   Evaluation 

An  implementation  of  the  CANDIDE  database  management  system  and  natural 
language  processor  was  performed  on  a  SUN  3/140  workstation.  A  high  level  object 
editor  and  database  browser  was  written  using  the  POPLOG  development  environ- 
ment. A  database  management  system  (objects  and  query  processor)  was  originally 
written  in  PROLOG,  and  later  rewritten  in  C.  A  database  containing  1000  objects  was 
created  covering  six  different  subject  areas.  The  constructs  provided  by  CANDIDE 
were  generally  adequate  for  modeling  these  domains.  A  great  deal  of  integration 
among  the  domains  was  achieved,  an  aspect  that  will  be  important  in  building  large 
information  retrieval  systems. 
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The  classifier  has  led  to  improvements  in  query  processing  because  of  its  ability  to 
exploit  database  semantics  through  terminological  reasoning.  Although  CANDIDE 
has  been  successful  as  a  modehng  tool  for  real  data  from  the  application  domain, 
there  is  a  big  problem  with  PRIMITIVE  classes.  The  classifier  cannot  reason  about 
PRIMITIVE  classes,  but  results  from  modehng  real  domains  show  that  almost  75% 
of  the  classes  in  the  appHcation  domain  are  PRIMITIVE.  That  is,  it  is  very  difficult 
to  build  class  descriptions  based  on  necessary  and  sufficient  conditions  which  apply 
to  most  real  data. 

Furthermore,  even  if  it  was  possible  to  create  a  DEFINED  class  that  applied  to 
many  instances,  it  could  not  apply  to  all  instances  in  the  class.  There  were  always  ex- 
ceptions. That  is,  there  were  always  instances  that  did  not  conform  to  the  structure  of 
the  DEFINED  class,  yet  they  clearly  were  members  of  the  class.  For  example,  a  stu- 
dent who  was  not  enrolled  in  course  could  not  be  classified  under  Student.  Exceptions 
were  quite  common,  but  the  data  model  was  inflexible  in  handling  exceptions. 

Finally,  there  is  a  problem  of  dealing  with  incomplete  information.  The  classifier 
cannot  operate  well  unless  a  complete  description  of  a  class  is  provided.  This  is 
rarely  the  case.  In  particular,  natural  language  expressions  provide  a  minimum  of 
information,  and  the  rest  must  be  inferred  through  default  reasoning  processes.  Many 
situations  demand  retrieval  of  information  based  not  only  on  incomplete  descriptions, 
but  descriptions  that  are  not  even  direct.  This  occurs  often  in  everyday  language 
when  metaphors  are  used.  Another  example  is  in  design  where  a  new  problem  may 
be  related  partially,  but  incompletely,  to  previous  design  solutions.  It  is  desirable  to 
retrieve  previous  solutions  and  adapt  them  to  the  new  situation. 

These  problems  associated  with  the  use  of  classification  and  semantic  data  mod- 
eling are  summarized  as  follows: 
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1.  There  is  something  fundamentally  wrong  with  the  notation  that  class  member- 
ship can  be  determined  by  necessary  and  sufficient  conditions. 

2.  The  system  must  be  able  to  recognize  and  accommodate  exceptions 

3.  Other  forms  of  reasoning  must  be  supported,  including  queries  ba^ed  on  partial 
information,  default  reasoning,  and  analogical  reasoning. 

For  the  natural  language  processor,  a  set  of  350  grammar  rules  was  developed,  and 
a  lexicon  was  created  having  a  2000  word  vocabulary.  The  LFG  analyzer  was  found 
to  be  very  robust  with  only  a  relatively  small  number  of  grammar  rules.  Based  on  a 
sample  of  300  questions  obtained  from  volunteers,  the  grammar  rules  were  sufficient 
to  correctly  parse  80%  of  the  sentence  structures  used  in  stating  database  queries. 
However,  the  small  vocabulary  accounted  for  a  high  degree  of  failure.  90%  of  the 
sample  questions  contained  words  which  were  not  in  the  lexicon.  This  is  not  suprising, 
since  it  was  expected  that  a  vocabulary  of  at  least  10,000  words  would  be  needed. 
What  is  more  serious,  is  that  there  is  a  great  diversity  in  the  number  of  ways  a  word 
can  be  used.  That  is,  just  having  a  large  vocabulary  is  not  enough,  there  must  be  an 
extensive  amount  of  knowledge  associated  with  each  word. 

This  preliminary  work  with  the  natural  language  processor  led  to  the  following 
conclusions: 

1.  Lexical  acquisition  is  the  biggest  bottleneck  in  building  practical  natural  lan- 
guage systems.  Acquisition  must  be  semi-automated,  not  only  because  of  the 
enormous  size  of  the  task,  but  because  usage  is  constantly  changing.  The  nat- 
ural language  system  must  constantly  adapt  to  new  usage. 

2.  There  is  a  tremendous  amount  of  information  associated  with  a  particular  lexical 
item.  Current  natural  language  processing  systems  use  representations  for  words 
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which  are  relatively  impoverished  in  that  they  contain  very  little  information 
on  each  word. 

3.  Although  the  grammar  could  account  for  80%  of  the  observed  sentence  struc- 
tures, covering  the  remaining  20%  is  not  simply  a  matter  of  adding  more  rules. 
Rather,  there  will  always  be  grammatical  structures  that  are  understandable 
but  are  outside  the  scope  of  any  fixed  set  of  grammar  rules.  This  opens  the 
whole  question  about  whether  natural  language  obeys  rules. 

These  problems  led  to  a  re-evaluation  of  the  underlying  approach  used  to  rep- 
resent database  classes  and  linguistic  knowledge.  It  was  found  that  many  of  these 
problems  had  a  common  basis.  Namely,  the  underlying  representation  wcis  based  on 
an  incomplete  theory  of  categorization.  In  the  following  chapters,  it  is  proposed  that 
a  better  theory  of  categorization  will  lead  to  improvements  in  data  modehng,  and  at 
the  same  time  lead  to  a  better  representation  for  word  meaning.  These  two  problems 
are  directly  related. 


CHAPTER  3 
CATEGORIES  AND  WORD  MEANING 

3.1    Category  Theory  and  the  Representation  of  Word  Meaning 

This  chapter  presents  the  main  theory  behind  the  terminological  knowledge  rep- 
resentation system.  It  includes  a  survey  of  ideas  from  psychology  and  philosophy 
pertaining  to  the  analysis  and  representation  of  word  meaning.  The  motivation  be- 
hind this  survey  is  to  provide  justification  for  a  particular  viewpoint  which  will  be 
called  the  category  theory  of  word  meaning.  An  informal  presentation  of  the  category 
theory  is  given  in  this  section.  A  formal  presentation  is  given  in  the  next  chapter. 

It  is  troublesome  even  to  describe  what  is  meant  by  word  meaning.  The  category 
theory  does  not  depend  on  the  view  that  words  have  identifiable  meanings  attached  to 
them.  That  view  is  a  popular  notion  fostered  largely  by  dictionaries.  As  will  be  seen 
below,  dictionaries  offer  a  shallow  view  of  word  meaning.  In  category  theory,  words 
obtain  meaning  through  their  usage.  Agents,  either  animals  or  computers,  capable 
of  understanding  language  are  said  to  understand  what  a  word  means  when  they  are 
able  to  use  words  to  communicate,  either  in  understanding  or  creating  utterances. 
This  idea  can  be  formalized  through  the  creation  of  computer  programs  that  process 
natural  language.  How  are  words  handled  in  the  design  of  a  computer  program  which 
is  capable  of  processing  natural  language? 

The  main  premise  in  the  category  theory  is  that  new  utterances  are  understandable 
because  of  their  similarity  to  previous  utterances.  Agents  capable  of  understanding 
language  maintain  a  vast  collection  of  these  previous  utterances  in  memory.  Un- 
derstanding a  new  utterance  consists  of  retrieving  utterances  from  memory  that  are 
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similar  in  some  way  (phonetic,  morphologic,  syntactic,  semantic)  and  applying  a  map- 
ping between  the  retrieved  utterances  and  the  new  utterance.  This  process  is  related 
to  a  branch  of  artificial  intelligence  known  as  Ccise-based  reasoning.  In  the  category 
theory,  an  important  analysis  tool  is  the  examination  of  a  large  corpus  of  text  con- 
taining many  instances  (these  would  be  cases)  in  which  a  word  is  used.  This  process, 
known  as  lexical  acquisition,  is  the  way  knowledge  about  word  use  is  acquired. 

Memory  consists  of  a  vast  number  of  cases  of  utterances.  Similar  utterances  are 
grouped  together  into  a  category.  Utterances  are  similar  either  because  of  empirical 
properties  or  because  they  conform  to  a  theory.  The  structure  of  categories  provides 
the  mechanism  for  determining  to  which  category  a  new  utterance  belongs.  Categories 
serve  one  very  important  purpose.  They  provide  missing  information  (default  values 
or  typical  values).  If  I  say,  "I  own  a  dog,"  you  will  probably  conclude  (justifiably  if  not 
correctly)  a  host  of  properties  which  I  did  not  mention  (has  four  legs,  has  fur,  barks,  I 
own  a  dog  license).  This  information  is  provided  by  the  category  connected  with  the 
word  "dog."  Categories  provide  a  way  of  summarizing  our  behefs  and  expectations 
about  the  world. 

Utterances  are  similar  either  because  of  empirical  properties  or  because  they  con- 
form to  a  theory.  That  words  have  similar  phonetic  properties  is  an  empirical  obser- 
vation. That  in  EngUsh  a  typical  sentence  consists  of  a  noun  phrase  followed  by  a 
verb  phrase  is  a  theory.  All  rules  of  grammar  are  theories.  That  mammals  have  fur 
is  a  theory.  Empirically  similar  entities  form  a  category  by  virtue  of  their  common 
properties.  A  theory  also  forms  a  category  consisting  of  the  entities  which  conform 
to  the  theory.  Psychologists  call  these  theories  cognitive  models  [61]. 

The  thing  about  theories  which  makes  life  interesting  is  that  they  are  very  often 
wrong.  Every  theory  (and  category)  has  exceptions.  Exceptions  are  the  spice  of  life. 
Not  all  birds  fly,  not  all  mammals  have  fur,  and  all  grammars  leak.  The  classic  view 
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of  categories  ignores  this  point  and  gives  rules  the  supreme  advantage.   The  classic 

view  says  that  membership  in  a  category  is  determined  by  a  conjunction  of  necessary 

and  sufficient  conditions.  Although  this  view  was  destroyed  by  Wittgenstein's  Family 

Resemblance  argument  [125],  it  is  still  very  popular.  For  example,  modern  theories  of 

grammar  are  dedicated  to  the  notion  of  categories,  yet  they  are  based  on  the  classic 

view  of  categories. 

The  new  category  theory  attempts  to  provide  a  more  realistic  view  of  categories. 
It  is  characterized  by  an  ability  to  recognize  and  accommodate  exceptions.  Each  new 
case  added  to  memory  results  in  an  incremental  modification  of  memory  structure 
to  accommodate  the  case  that  may  be  a  radical  exception.  Categories  will  often 
be  created  by  grouping  cases  that  in  total  have  little  or  nothing  in  common,  thus 
producing  the  family  resemblance  effect.  A  new  case  may  be  added  to  a  category 
not  because  it  conforms  to  a  cognitive  model,  but  because  it  is  similar  to  some  cases 
already  in  the  category. 

Cognitive  models  do  play  a  role  in  the  new  category  theory.  They  behave  exactly 
like  scientific  hypotheses.  A  category  can  contain  many  competing  models,  just  as  in 
science  there  are  many  competing  theories  to  explain  a  particular  phenomenon.  The 
cases,  just  like  experimental  observations,  may  or  may  not  conform  to  a  particular 
model.  Scientific  principles,  just  as  category  structure,  undergo  constant  anarchy  and 
revolution.  In  categories,  cases  are  constantly  at  battle  with  cognitive  models.  Models 
attempt  to  describe  cases,  and  cases  constantly  defy  models  through  exceptions. 

This  is  a  brief  description  of  the  category  theory  of  word  meaning.  Each  word  is 
used  in  many  utterances  with  varying  implications.  We  may  form  theories  or  models 
about  how  the  word  is  used.  There  are  typically  many  models  associated  with  a 
particular  word.  Each  model  describes  only  a  subset  of  utterances  containing  the 
word.  Novel  and  exceptional  uses  of  the  word  will  always  appear.  The  category  theory 
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handles  exceptions  by  attempting  to  find  existing  utterances  in  the  case-based  memory 
that  can  be  related  by  structural  similarity  to  the  new  usage.  Accommodating  new 
utterances  results  in  altering  the  structure  of  categories,  possibly  also  creating  a  new 
model. 

If  the  meaning  of  a  word  could  be  pointed  to,  one  would  point  to  the  structure 
of  the  categories  containing  utterances  of  the  word.  The  category  structure  is  very 
complex  and  contains  a  large  number  of  cases  and  models.  The  category  structure 
essentially  gathers,  organizes,  and  tries  to  explain  every  single  utterance  of  a  word. 
But  the  category  structure  exists  in  a  physical  memory  and  has  a  particular  config- 
uration at  any  point  in  time.  To  say  that  this  represents  the  word  meaning  is  not 
quite  correct  since  a  new  usage  of  the  word  may  appear  at  any  time  that  would  not 
conform  to  a  particular  structure.  Word  understanding  is  a  dynamic  process. 
3.1.1    Purpose  of  Categories 

In  knowledge  representation  systems  based  in  inheritance  networks  [94,6,9,45], 
the  role  of  a  class  object  has  had  many  interpretations.  In  most  cases,  class  objects 
contain  attributes  and  values  that  are  inherited  by  subordinate  classes  and  instances. 
In  others,  classes  hold  properties  that  are  generalizations  over  subordinate  classes  and 
instances.  In  general,  class  objects  are  considered  to  define  conditions  that  are  true 
and/or  which  must  be  true  of  the  subordinate  classes  and  instances.  That  is,  class 
objects  provide  necessary  and  sufficient  definitions  for  class  membership. 

Category  theory  has  produced  a  new  view  of  class  descriptions.  Because  of  the 
family  resemblance  nature  of  category  cohesion,  it  is  possible  that  class  descriptions 
contain  little  or  no  information.  It  may  be  that  subordinates  do  not  have  anything 
significant  in  common.  Thus,  class  descriptions  alone  cannot  be  used  to  determine 
category  membership.  The  information  content  of  the  category  is  obtained  by  the 
complex  interactions  of  the  class,  subclasses,  and  instances  within  a  class. 
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The  purpose  of  categories  is  to  summarize  the  behavior  of  a  group  of  instances. 
Categories  are  generalizations.  Categories  provide  a  quick  way  of  referring  to  a  group 
of  instances.  The  need  to  reference  a  particular  group  of  instances  is  usually  driven 
by  some  goal,  and  this  goal  plays  a  direct  role  in  determining  the  structure  of  the 
category.  Categories  allow  inferences  to  be  made  on  the  basis  of  partial  information.  If 
an  instance  can  be  shown  to  belong  to  a  particular  class,  then  other  information  about 
the  instance  not  explicitely  given  may  be  inferred  inductively.  Categories  correspond 
to  the  original  concept  of  "entity  type"  and  "class"  in  semantic  data  modehng  and 
knowledge  representation.  Finally,  words  in  natural  language  map  to  categories. 
Thus,  complex  propositions  about  categories  can  be  stated  easily  in  natural  language. 

The  purpose  of  categories  may  be  viewed  as  providing  a  way  to  cluster  together 
instances  that  are  in  some  way  similar.  But  there  would  be  little  point  in  such 
activity  if  the  resulting  classes  did  not  serve  other  purposes.  Rather,  it  is  the  job  of 
a  conceptual  clustering  algorithm  to  cluster  instances  into  classes  for  the  purpose  of 
satisfying  some  goal.  A  conceptual  clustering  algorithm  should  be  based  on  a  sound 
theory  of  category  formation.  Such  a  theory  is  presented  in  the  following  sections.  A 
formal  conceptual  clustering  algorithm  is  presented  in  the  next  chapter. 
3.1.2   Classical  View  of  Categories 

The  new  category  theory  contrasts  with  an  earlier  view  known  as  the  classical 
view  of  categories.  In  the  classical  view,  categories  have  an  ontological  status  as 
independent  entities  that  exist  apart  from  an  observer.  They  exist  objectively  as  part 
of  the  external  world.  In  the  extreme,  categories  are  platonic  forms.  Things  in  the 
world  come  prepackaged  into  categories,  and  it  is  our  job  to  discover  these  categories. 

In  the  classical  view,  category  membership  is  determined  by  necessary  and  suffi- 
cient conditions.  Words  have  a  single,  essential  meaning.  The  individual  members 
of  a  category  may  exhibit  great  differences,  but  they  all  have  something  in  common. 
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Likewise,  a  word  may  be  used  in  many  diverse  situations,  but  there  is  something 
essentially  the  same  about  all  these  situations. 

Structured  knowledge  representation  languages  such  as  frames,  semantic  networks, 
and  conceptual  dependencies  have  long  been  used  for  representing  language  meaning. 
In  these  systems,  classes  of  structured  objects  are  created  by  establishing  relation- 
ships (through  slots  or  links)  between  entities  represented  by  symbols.  The  meaning 
of  a  natural  language  expression  is  represented  by  combining  smaller  structured  ob- 
jects representing  words  and  phrases  together  into  larger  groups  representing  entire 
expressions. 

The  departure  from  traditional  knowledge  representation  approaches  has  to  do 
with  the  role  of  models,  theories,  rules,  and  abstract  objects  in  representing  word 
meaning.  Essentially,  the  meaning  of  a  word  cannot  be  represented  by  a  simple  defi- 
nition in  the  form  of  a  propositional  statement.  The  notion  that  dictionary  definitions 
are  sufficient  to  represent  word  meanings  is  too  simple.  Such  definitions  do  not  con- 
tain enough  information  to  represent  the  diversity  of  ways  in  which  a  particular  word 
may  be  used. 

Likewise,  the  information  content  of  lexical  entries  used  in  current  natural  lan- 
guage processing  systems  is  impoverished  in  comparison  to  what  is  needed.  For 
example,  in  the  Lexical-functional  Grammar  [51],  a  lexical  entry  contains  only  such 
information  as: 

hand,  transitive  verb,  pred  =  hand(Subject,  Direct  Object,  Indirect  Object) 

which  is  just  a  predicate  template  for  the  verb  "hand"  with  a  possible  argument 
structure.  In  the  "Naive  Semantics"  proposed  by  Dahlgren  [17],  lexical  entries  have 

a  richer  structure.  The  claim  is  made  that: 

"Naive  Semantic  representations  of  generic  knowledge  contain  fifteen  or 
more  pieces  of  information  per  word,  relatively  more  than  required  by 
other  theories." 
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In  contrast,  the  new  category  theory  would  map  each  lexical  entry  to  a  massive  set 
of  associated  cases  and  theories.  Experiments  with  the  lexical  acquisition  algorithm 
(Section  5.3.4)  indicate  that  a  few  dozen  classes  and  several  hundered  cases  per  word 
would  not  be  unusual.  There  is  simply  a  tremendous  amount  of  information  known 
about  a  word. 

The  problem  of  categorization,  how  people  place  objects  into  classes,  general- 
izes and  illustrates  the  difficulty.  In  most  representation  systems  used  in  artificial 
intelligence,  categories  are  defined  by  formal  propositions  that  state  necessary  and 
sufficient  conditions  for  category  membership.  This  is  evident,  for  example,  in  term 
subsumption  languages  such  as  KL-ONE  [9].  An  instance  is  a  member  of  a  category 
(KL-ONE  Concept  node)  if  it  satisfies  the  restrictions  specified  by  the  Concept  node. 
In  machine  learning,  algorithms  such  as  ID3  [91]  and  CLUSTER  [114]  automatically 
generate  classification  schemes  by  examining  the  attributes  of  sets  of  instances.  The 
result  is  a  decision  tree  in  IDS  which  is  traversed  to  determine  class  membership  of 
a  new  instance,  or  a  class  description  in  CLUSTER  which  states  the  properties  an 
instance  must  have  to  be  a  member  of  a  class.  Although  these  systems  represent 
the  first  attempts  at  generating  categories  through  machine  learning,  the  resulting 
categories  are  primitive  and  do  not  always  match  with  culturally  accepted  natural 
categories  such  as  those  which  determine  word  meaning.  Expert  systems  are  equally 
rigid  since  a  fixed  rule  is  used  to  determine  what  is  the  case.  The  problem  is  that 
rules  have  exceptions.  Categories  based  on  necessary  and  sufficient  conditions  do  not 
accurately  represent  real  categories. 

It  is  not  possible  to  give  simple  definitions  of  words  in  the  form  of  necessary 
and  sufficient  conditions.  The  multiple  ways  in  which  a  word  can  be  used  form  a 
category,  but  it  is  difficult  to  say  what  all  the  members  of  this  category  have  in 
common.  Rather,  the  association  results  from  an  almost  intangible  similarity  among 
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the  members.  This  is  the  "family  resemblance"  problem  identified  by  Wittgenstein. 
Considerable  research  conducted  in  psychology  since  the  mid  1970's  has  supported 
and  elaborated  this  view  [84,61,96]. 

Other  classical  approaches  to  forming  categories,  such  as  distance  in  feature  space 
or  specification  of  an  exemplar  or  prototypical  class  member,  are  also  too  simple  to  be 
functional.  In  the  case  of  feature  space,  an  arbitrary  threshold  inevitably  is  needed  to 
identify  borderline  cases.  Specification  of  a  prototype  does  not  help  in  deciding  how 
much  like  the  prototype  an  instance  must  be.  Canceling  of  default  values  specified 
by  the  prototype  leads  to  incoherent  systems  of  reasoning. 

The  difficulty  of  using  necessary  and  sufficient  propositions  for  determining  cate- 
gory membership  is  that  they  cannot  deal  with  exceptions.  A  simple  definition  does 
not  capture  the  tremendous  diversity  in  the  number  of  ways  a  single  word  can  be 
used.  Furthermore,  any  language  processing  system  must  have  the  capacity  to  learn, 
to  acquire  new  word  senses  since  1)  Language  is  constantly  changing  and  2)  The 
enormous  job  of  knowledge  acquisition  needed  to  build  robust  language  processing 
systems  cannot  be  done  entirely  by  hand. 

The  classical  theory  is  not  a  complete  theory  of  categorization  because  it  cannot 
deal  with  exceptional  cases.  Classical  knowledge  representation  techniques  are  brittle; 
they  cannot  deal  with  situations  that  go  beyond  their  boundaries.  Yet  humans  form 
categories  with  highly  diverse  elements  and  can  easily  recognize  borderline  cases. 
3.1.3   A  Modern  View  of  Categories 

The  new  view  of  categories  is  a  compendium  of  several  sources.  The  categorization 
problem  is  an  important  topic  in  cognitive  science,  including  psychology  [61,84,44], 
philosophy  [104],  artificial  intelligence  [114,25,43,56,64],  and  database  management 
[20].  Categories  exhibit  many  interesting  and  unusual  characteristics  which  have 
been  largely  ignored  in  data  modeling  and  knowledge  representation.    Some  of  the 
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important  aspects  of  categories  will  now  be  discussed. 

3.1.3.1  Family  resemblance 

Class  membership  cannot  be  defined  by  simple  necessary  and  sufficient  conditions. 
There  is  very  little  that  all  members  of  a  class  have  in  common.  Rather  they  are 
related  only  by  a  vague,  often  indescribable,  family  resemblance  [125].  Classes  consist 
of  a  very  complex  clustering  of  concepts.  In  addition,  effects  such  as  prototypes,  basic 
levels,  similarity,  boundary  effects,  and  exceptions  are  phenomena  to  be  explained. 

The  family  resemblance  effect  is  the  trend  for  a  class  description,  that  is  a  descrp- 
tion  which  apphes  to  all  the  instances  of  the  class,  to  contain  less  and  less  information 
as  the  number  of  instances  in  the  class  increases.  As  the  number  of  instances  becomes 
large,  there  is  very  little  that  all  the  instances  in  the  class  have  in  common.  Thus, 
it  is  not  possible  to  state  necessary  and  sufficient  conditions  for  class  membership 
by  listing  characteristics  that  an  instance  must  have  to  be  a  member  of  the  class. 
This  means  that  user-defined  predicates  of  the  ECR  model  [20]  or  the  necessary  and 
sufficient  attribute  restrictions  of  term  subsumption  languages  [9]  are  not  adequate 
for  determining  class  membership. 

3.1.3.2  Prototypes  and  defaults 

A  theory  of  categorization  must  account  for  prototypes  and  defaults.  Default 
values  are  values  of  attributes  that  instances  of  a  category  typically  have.  Yet  it 
is  not  required  that  all  instances  of  the  category  have  these  default  values.  The 
default  value  may  be  used  in  the  absence  of  a  specified  value  as  long  as  it  does  not 
contradict  other  known  fact  [93].  Similarly  for  a  given  category,  there  may  exist  a 
hypothetical  prototype  instance.  The  prototype  represents  a  typical  member  of  the 
category.  Certain  category  members  are  more  typical  or  representative  of  the  category 
than  others.  For  example,  robin  is  more  typical  of  bird  than  penguin.  The  prototype 
instance  may  or  may  not  exist  as  an  actual  instance.     It  has  been  demonstrated 
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experimentally  [96]  that  a  particular  instance  can  be  rated  in  terms  of  its  degree  of 
typicality  to  a  category.  The  more  closely  an  instance  matches  the  prototype,  the 
stronger  the  degree  of  typicality. 

One  question  is  whether  prototypes  and  default  values  need  to  be  represented 
explicitly.  Many  systems  exphcitly  represent  prototypes  as  a  data  structure  stored  in 
the  class  object  [58].  The  approach  below  (Section  4.2.6)  computes  prototypes  and 
default  values  by  reasoning  over  the  set  of  class  instances.  Most  data  models  ignore 
defaults  and  prototype  effects  and  treat  all  instances  of  a  class  as  equal  members. 

In  the  case  of  word  meanings,  some  usages  are  more  typical  than  others.  One  way 
to  explain  this  prototype  effect  is  through  default  values  obtained  by  reasoning  over 
the  set  of  cases.  This  does  not  require  that  a  prototype  case  be  represented  explicitly. 
For  example,  consider  "John  is  57."  The  default  interpretation  of  this  phrases  is  "57 
years  old."  Note  however  that  57  might  also  mean  height  in  inches.  But  in  the  terms 
of  the  number  of  times  the  expression  <Person> IS < Number >  is  used,  it  is  most 
often  used  in  the  sense  of  age.  Thus,  the  ambiguity  can  be  resolved  by  considering 
the  default  over  the  number  of  cases  the  expression  has  been  used  in  the  past. 

3.1.3.3  Basic  level 

Vertical  dimension  in  a  class  generalization  hierarchy  refers  to  the  "top  to  bottom" 
or  "general  to  specific"  dimension.  The  basic  level  effect  [75]  says  that  certain  classes 
along  the  vertical  dimension  are  more  fundamental  than  others.  For  example,  in  the 
progression  collie  -  dog  -  mammal  -  animal,  "dog"  is  the  basic  level.  Basic  level  classes 
group  instances  of  similar  appearance  or  function.  Basic  level  classes  are  usually  the 
first  classes  to  be  learned. 

3.1.3.4  Similarity  and  case-based  reasoning 

The  process  of  recognition,  deciding  how  to  put  a  particular  object  into  a  category, 
is  not  dictated  by  a  fixed  law  or  theory  of  category  membership.    Rather,  the  em- 
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phasis  is  on  the  particulars  of  the  instance  being  considered.  In  case-based  reasoning 
(CBR)  [57],  a  new  instance  is  analyzed  by  comparing  its  characteristics  to  instances 
previously  encountered.  By  this  principle,  a  lexicon  should  contain  a  large  case-base 
of  instances  in  which  a  word  has  been  used.  When  another  instance  is  encountered, 
conclusions  about  how  the  word  is  being  used  can  be  made  by  comparing  it  to  ways 
the  word  was  used  previously.  Most  likely,  the  new  use  will  not  match  any  of  the 
previous  cases  exactly,  but  rather  there  will  be  a  similarity  or  partial  matching,  often 
an  analogical  or  metaphorical  mapping.  It  is  the  ability  to  analyze  new  situations 
that  are  slightly  different  from  previously  acquired  experiences  that  gives  flexibihty 
to  case-based  reasoning. 

In  case-based  reasoning  the  learning  process  is  incremental.  As  each  new  case 
comes  into  the  system,  the  existing  memory  structure  is  dynamically  altered  until  a 
new  configuration  is  reached  that  accounts  for  the  new  case.  In  the  lexical  acquisition 
process,  the  evolution  of  word  usage  is  also  such  an  incremental  process. 

In  the  early  stages  of  category  formation,  the  characteristics  of  instances  are  im- 
portant in  determining  category  membership  [54].  This  requires  that  two  instances  be 
compared  for  similarity.  The  comparison  is  made  on  the  basis  of  instance  similarity 
defined  in  terms  of  the  instances'  structure.  The  goal  in  CBR  is  to  retrieve  instances 
from  memory  that  can  somehow  be  related  to  the  new  instance.  The  emphasis  is  on 
reasoning  about  instances,  whereas  in  virtually  all  data  models  and  knowledge  repre- 
sentation systems  the  emphasis  is  on  comparing  two  class  descriptions  or  comparing 
an  instance  to  a  class  description. 

Moving  beyond  the  classical  view  requires  lowering  the  status  of  rules,  definitions, 
and  conditions.  Understanding  a  particular  situation  depends  not  just  on  how  that 
situation  conforms  to  pre-existing  ideals,  but  on  the  details  and  unique  features  of 
the  situation.  The  particulars  oi  the  case  are  just  as  important. 
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The  movement  away  from  theory  towards  the  study  of  individual  instances  is  the 
main  theme  in  the  field  of  case-based  reasoning.  For  example,  extracting  rules  from 
experts  may  be  difficult  because  experts  do  not  operate  from  rules.  Instead,  experts 
have  an  enormous  body  of  experience  in  the  form  of  cases.  When  faced  with  a  new 
problem,  the  expert  retrieves  one  or  more  cases  that  are  similar  to  the  new  problem 
and  applies  previous  solutions  as  appropriate.  Thus,  the  new  situation  is  compared 
to  previous  situations,  not  to  a  general  rule. 

In  terms  of  categorization,  new  instances  are  not  compared  to  a  generalized  class 
definition,  but  rather  to  other  instances.  Thus,  instances  are  clustered  together  be- 
cause they  have  similar  properties,  even  though  there  may  be  nothing  in  common 
among  all  instances  in  the  class. 

This  approach  depends  on  a  memory  organization  that  is  capable  of  storing  and 
retrieving  large  numbers  of  cases.  The  trend  toward  such  large  scale  knowledge  bases 
is  evident  in  a  number  of  techniques  such  as  memory-based  parsing  [113]  and  Minsky's 
Society  of  Mind  [79].  In  these  systems,  the  ability  to  leason  about  new  and  unusual 
situations  is  a  result  of  the  vast  number  of  cases  available. 

Problems  with  similarity-based  approach.  If  categories  are  to  be  formed  on  the 
basis  of  matching  instances  with  similar  features,  there  is  now  the  problem  of  what 
counts  as  being  similar.  Descriptions  of  two  instances  must  be  compared.  But  how 
can  a  description  of  an  instance  be  created  in  the  first  place? 

Basic  building  blocks  are  needed  for  forming  descriptions.  Some  relationships  may 
exist  as  epistemological  primitives.  Object,  ISA,  part-of,  related-to,  instance-of,  and 
time/space  relationships  may  be  given  as  part  of  an  a  priori  vocabulary  for  building 
descriptions.  But  these  cannot  explain  the  formation  of  higher-level  concepts.  It  is  not 
fair  to  say  that  feathers,  beak,  and  wings  are  independent  properties  which  constitute 
bird,  since  these  concepts  have  meaning  only  within  the  concept  of  bird  [73].  That  is. 
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they  must  be  connected  within  a  bird  structure.  For  example,  part  of  what  makes  a 
wing  a  wing  is  that  it  is  part  of  a  bird.  Thus  it  is  begging  the  question  to  define  a  bird 
in  terms  of  these  constituent  parts.  In  many  cases,  properties  are  correlated,  such 
as  "swims/webbed  feet."  Seeing  the  correlation  requires  an  understanding  beyond 
comparison  of  common  features. 

The  very  notion  of  similarity  implies  that  we  recognize  objects  that  have  features 
in  common.    The  problem  is  that  features  are  themselves  products  of  theories,  of 
our  pre-conceived  views  of  the  world.  They  are  not  independent  properties  existing 
objectively  apart  from  the  observer. 
3.1.3.5   Cognitive  models  and  explanation-based  learning 

Empirical  reasoning  about  instance  similarity  alone  is  not  sufficient.  Our  theories 
and  beliefs  about  the  world  do  play  a  role  in  categorization.  It  is  not  even  possible 
to  build  descriptions  of  instances  without  making  assumptions  based  on  behef  about 
the  nature  of  the  world.  Instance  descriptions  cannot  be  created  independently  of 
theory.  Certainly,  mere  physical  appearance  does  not  capture  the  nature  of  an  entity. 
For  example,  it  is  not  fair  to  say  that  a  chair  is  the  sum  of  arms,  legs,  and  back,  since 
these  notions  do  not  occur  independently  of  the  concept  of  chair. 

Cognitive  models  are  theories  about  the  world.  Cognitive  models  play  an  impor- 
tant role  in  category  formation  [61].  Instances  belong  in  a  category  to  the  extent 
that  they  conform  to  cognitive  models  associated  with  the  category.  A  database  class 
description  is  one  type  of  cognitive  model.  Other  types  include  frames  [78],  scripts 
[102],  goal  dependency  networks  [114],  causal  graphs  [39],  means-ends  plans  [127], 
explanations  [128],  and  qualitative  simulations.  Qualitative  simulation  is  explored 
further  in  Chapter  6. 

Explanation-based  learning  (EBL)  is  a  machine  learning  technique  in  which  mod- 
els are  selected  and  modified  if  necessary  to  explain  the  observed  situation  [80].  Tech- 
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niques  of  EBL  are  applied  in  the  conceptual  clustering  algorithm  (Section  4.2)  to  fit 
class  descriptions  to  instances.  EBL  techniques  complement  the  CBR  techniques  in 
that  EBL  is  a  top-down  and  CBR  a  bottom-up  approach.  Unfortunately,  EBL  is 
in  an  early  stage  of  development  and  is  not  yet  capable  of  generating  entirely  new 
models. 

Cognitive  models  can  be  used  in  CBR.  For  example,  analysis  of  cases  in  which 
a  machine  malfunctions  and  the  symptoms  are  compared  with  a  causal  model  of 
how  the  machine  works  [42].  The  causal  model  explains  the  relationships  between 
the  symptoms  and  problem  and  could  be  used  to  weight  the  symptoms  as  being 
very  important  (such  as  smoke  in  the  exhaust  of  a  gasoline  engine)  or  unimportant 
(scratched  paint).  Note  that  without  the  model  it  is  difficult  to  correlate  the  features 
of  the  situation  or  assign  significance  to  features. 

Categorization  can  be  viewed  as  resulting  from  an  interaction  between  theories  and 
observations.  That  is,  case-based  reasoning  is  combined  with  explanation-based  learn- 
ing. This  trend  is  evident  in  recent  work  in  machine  learning  in  which  empirical-based 
approaches  are  being  merged  with  explanation-based  learning  [105].  This  situation  is 
exactly  like  that  of  the  relationship  between  scientific  theory  and  experimental  data. 
Empirical  observations  suggest  theories.  Theories  tell  what  data  to  observe.  Obser- 
vations confirm  or  contradict  theories.  Such  an  approach  is  marked  by  continuous 
anarchy  and  revolution,  characteristic  of  shifts  in  scientific  paradigm  [60]. 

Formalizing  the  Notion  of  Cognitive  Models.  Constant  conflicts  between  empiri- 
cal evidence  and  theories  result  in  continuing  shifts  in  perception,  a  process  not  unlike 
that  of  scientific  revolution.  There  is  not  one  model  associated  with  a  category,  but 
many,  often  conflicting  models.  Thus,  a  vast  number  of  models  must  be  included 
along  with  a  vast  number  of  cases  in  the  lexical  knowledge  base. 

It  may  seem  paradoxical  that  on  one  hand  cognitive  models  are  possible  and  play 
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a  role  in  category  formation,  while  on  the  other  hand  this  contradicts  the  family 
resemblance  claim  that  such  class  description  is  not  possible.  The  solution  to  the 
paradox  is  that  there  is  usually  not  one,  but  many,  different  and  often  contradictory 
cognitive  models  associated  with  a  class.  For  example,  only  a  portion  of  the  observed 
instances  of  a  class  may  fit  a  particular  database  class  description. 

If  cognitive  models  play  such  a  strong  role  in  categorization,  then  it  would  be  im- 
portant to  formalize  these  models.  Unfortunately,  the  cognitive  psychologists  appear 
to  have  only  a  general  notion  of  the  nature  of  these  models.  In  few  cases  have  any 
models  actually  been  formally  "written  down."  Although  the  major  representational 
techniques  (rules,  frames,  scripts,  semantic  networks,  and  conceptual  dependencies) 
have  been  suggested  as  being  like  cognitive  models,  there  appears  to  be  some  caution 
in  fully  endorsing  these  as  formal  representations  of  cognitive  models. 

The  notion  of  cognitive  model  must  be  formalized.  But  what  form  should  this 
notation  take?  A  formal  cognitive  model  in  symbolic  notation,  such  as  a  qualitative 
simulation,  would  have  the  same  properties  of  the  original  classical  theory,  and  hence 
return  to  all  the  original  problems.  Any  particular  model  is  capable  of  accounting  for 
only  a  specific  domain  of  instances.  There  are  always  exceptions  and  limits  to  what 
any  model  is  capable  of  describing. 

The  mistake  is  to  give  too  much  authority  to  the  work  performed  by  a  model.  The 
solution  to  this  dilemma,  if  any,  is  to  view  a  model  not  as  the  ultimate  model,  but  as 
one  of  many  possible  theories.  For  there  are  many  models  for  representing  any  concept 
or  situation,  and  one  or  more  of  them  may  be  appropriate  for  explaining  or  reasoning 
in  any  particular  instance.  Even  for  simple  concepts,  a  vast  number  of  models  are 
involved  at  various  levels  of  resolution  and  complexity  [130,131,27].  Expanding  upon 
the  theme  in  Ccise-based  reasoning,  the  memory  organization  includes  a  vast  number 
of  models,  clustered  around  concept  categories.  In  addressing  a  new  problem,  relevant 
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models  can  be  selected  from  among  many  possible  models.  This  process  parallels  the 
selection  of  previous  cases  which  may  have  a  bearing  on  the  problem. 

Such  a  memory  organization  would  be  capable  of  explaining  variations  and  excep- 
tions. With  many  models  and  cases  to  choose  from,  the  suitable  model  for  describing 
an  unusual  situation  can  be  retrieved  and  adapted.  An  exception  to  one  model  would 
be  explained  by  another,  different  model. 
3.1.3.6   The  symbol  grounding  problem  and  categorical  perception 

Consider  a  world  devoid  of  experiences  and  consisting  only  of  symbols  and  logic. 
Can  a  language  that  is  based  only  on  symbols  have  meaning?  The  symbol  grounding 
problem  has  been  introduced  by  Hamad  [44].  Harnad  argues  that  symbol  systems 
which  are  given  a  semantics  only  through  reference  to  other  symbols  are  not  grounded. 
For  example,  a  dictionary  is  not  grounded  since  the  words  are  defined  using  other 
words.  A  system  based  on  Katz's  semantic  markers  [53]  is  not  grounded  since  here  a 
natural  language  expression  is  merely  translated  into  "markerese"  [66].  For  the  same 
reason,  translation  of  a  natural  language  expression  into  a  formal  logic  statement  is 
ungrounded.  A  programmer  writing  an  artificial  intelligence  program  is  creating  a 
system  of  symbols  that  are  not  grounded,  since,  according  to  Harnad,  the  symbols 
are  not  related  to  anything  in  the  world.  Such  programs  are  deprived  of  experience 
and  sensory  input.  Likewise,  a  system  of  categories  created  without  such  input  is 
ungrounded. 

Harnad  claims  that  symbols  can  be  grounded  by  connecting  them  with  categories 
which  are  themselves  derived  from  direct  sensory  experience.  Harnad  proposes  a 
three-level  representation  system  for  categorization.  The  first  level  is  iconic,  a  direct 
analogical  coding  of  sensory  input.  Icons  are  analogical  representations  in  the  sense 
that  a  photograph  is  a  representation.  The  second  level  is  the  creation  of  categories 
from  these  icons  through  filtering.  Though  the  details  of  these  filters  are  unclear,  and 
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at  times  it  sounds  as  though  Hamad  is  endorsing  a  classical  view  of  categories,  these 
filters  perform  the  function  of  grouping  iconic  representations  into  categories  based 
on  similarity  and  common  features.  Finally,  the  third  level  introduces  symbols  by 
associating  symbolic  labels  (e.g.  words  or  mental  descriptions)  to  categories.  Such 
symbols  enable  powerful  manipulations  of  categories  through  propositional  statements 
expressed  using  these  symbols  (e.g.  sentences).  This  three-level  representation  is  a 
grounded  symbol  system  since  the  symbols  are  ultimately  connected  to  direct  expe- 
rience through  the  iconic  layer. 

Fodor  has  provided  a  similar  argument  for  symbol  grounding.  Symbols  have  mean- 
ing via  a  causal  chain  by  which  they  are  connected  with  entities  in  the  physical  world 
[31].  Thus,  Fodor  proposes  a  kind  of  denotational  semantics,  and  the  mechanism 
for  establishing  the  denotation  is  a  causal  chain.  The  initial  segments  of  this  chain 
are  grounded  by  psychophysics,  that  is,  they  can  be  explained  in  terms  of  physical 
conditions.  For  example,  the  connection  between  the  red  color  of  a  physical  object, 
and  the  idea  of  "red"  occurs  under  a  set  of  physical  conditions: 

"Paint  the  wall  red,  turn  the  lights  up,  point  your  face  toward  the  wall, 
and  open  your  eyes.  The  thought  "red  there"  will  occur  to  you;  just  try 
it  and  see  if  it  doesn't." 

Psychophysical  laws  determine  how  changes  in  the  environment  affect  the  symbolic 
realization  of  our  sensations.  For  example,  the  causal  chain  from  red  objects  to  the 
symbol  for  "red"  in  our  brains  is  mediated  by  physics  and  the  conditions  of  observa- 
tion. Light  hits  the  object,  bounces  off  (all  but  red  lightwaves  being  absorbed  in  the 
process),  hits  the  eye,  impacts  the  retina,  and  initiates  a  sequence  of  neural  impulses 
that  ultimately  result  in  the  idea  "red."  Symbols  denoting  complex  concepts  such  as 
"horse,"  however,  are  not  related  directly  to  an  observable  physical  property.  That 
is,  horses  do  not  radiate  horseness.  But  there  is  still  a  causal  chain  connecting  the 
symbol  "horse"  to  observed  horses.  This  chain  is  mediated  by  a  horse  theory.  Thus, 
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theories  are  like  transducers  which  take  sensations  as  input  and  produce  symbols  (e.g. 
"horse")  as  output.  The  role  of  theories  described  by  Fodor  is  thus  much  like  that 
of  cognitive  models.  The  details  of  a  theory  are  not  important,  in  fact  the  theory 
may  even  be  wrong.  All  that  is  required  is  that  a  causal  chain  be  established  from 
physical  entity  to  symbol,  and  some  of  the  links  of  this  chain  may  be  theories. 

In  general,  today's  computers  do  not  have  access  to  direct  sensory  input.  This  cre- 
ates one  of  the  biggest  problems  for  artificial  intelligence.  Though  a  symbol  grounding 
theory  can  be  postulated,  it  will  be  difficult  to  implement  or  even  design  based  on 
what  is  currently  known  about  vision,  speech  recognition,  and  so  forth.  This  would 
appear  to  be  a  dead  end  were  it  not  for  the  programmer.  In  spite  of  Hamad's  criti- 
cisms, the  programmer  can  act  as  the  eyes  and  ears  of  the  program.  It  is  claimed  that 
a  symbol  system  can  be  grounded  through  the  programmer.  This  is  because  the  pro- 
grammer completes  the  causal  chain  between  the  physical  world  and  the  computer's 
symbols  as  required  by  Fodor. 
3.1.3.7   A  priori  concept  network 

Neither  instances  or  models  may  be  constructed  without  a  representational  lan- 
guage. Of  course,  this  is  required  in  systems  using  formal  symbolic  notations.  The 
problem  of  primitives  and  bootstrapping  concepts  should  be  handled  in  a  domain- 
independent  fashion.  Epistemological  primitives  such  as  ISA,  part/whole  relation- 
ships, aggregation,  and  interaction,  can  be  used  to  build  structural  descriptions  of 
concepts.  In  computational  systems,  an  initial  network  of  concepts  must  be  created 
by  hand,  since  the  computer  has  the  disadvantage  of  being  isolated  from  the  direct  ex- 
perience of  physical  objects,  situations,  and  contexts  that  language  describes.  Such  a 
network  is  necessary  as  a  substrate  for  initial  theory  building  and  to  provide  contexts 
in  which  to  build  instance  descriptions. 
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3.1.3.8  Language  game 

The  social  aspects  of  categorization  play  a  central  role  in  evolution  of  word  usage, 
a  concept  illustrated  in  Wittgenstein's  language  games.  The  consequence  is  that  the 
machine  learning  processes  for  lexical  acquisition  must  always  be  guided  by  human 
interaction.  The  degree  of  human  interaction  in  existing  systems  for  lexical  acquisition 
ranges  from  minimal  [48]  to  dominant  [119].  Since  social  interaction  is  essential,  it 
must  be  included  directly  in  the  design  of  learning  algorithms. 

3.1.3.9  Exception  handhng 

Exceptions  give  classes  their  complex  structure  and  dynamic  nature.  Instance  / 
is  an  exception  if  it  belongs  in  a  class  C,  but  it  violates  existing  class  descriptions 
for  determining  membership  in  C.  The  converse  is  also  possible.  Instance  /  is  an 
exception  if  it  is  not  a  member  of  C,  yet  it  satisfies  a  class  description  for  membership 
mC. 

The  notion  of  exception  has  received  various  treatments  in  the  past.  A  strong  view 
of  exception  handling  is  proposed  in  which  the  system  automatically  detects  exceptions. 
Then  the  database  structure  must  be  modified  to  account  for  the  exception.  There 
are  two  cases  of  exception  detection: 

1.  In  the  Trivial  Case,  the  system  is  TOLD  that  the  new  instance  belongs  in  a 
class,  yet  it  violates  the  rules  for  membership  in  that  class. 

2.  In  the  General  Case,  the  system  is  NOT  TOLD  that  the  new  instance  is  an 
exception  to  a  specific  class,  but  rather  the  system  is  expected  to  automatically 
identify  the  class  in  the  process  of  detecting  the  exception. 

Case  1  is  trivial  since  most  of  the  work  of  detecting  the  exception  is  removed.  Case 
2  is  difficult.  In  Section  4.2.4,  an  Exception  Condition  is  presented  that  is  necessary 
but  not  suflRcient  for  identifying  exceptions  in  the  general  case.     It  appears  that 
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some  manual  intervention  is  necessary  for  Case  2,  although  the  system  is  able  to 
automatically  suggest  that  an  instance  may  be  an  exception  and  identify  the  relevant 
class. 

The  ability  to  handle  exceptions  does  not  simply  mean  that  a  description  of  an 
exception  has  been  added  to  the  class  description.  This  approach  has  been  taken  by 
others  [49,4],  in  that  a  class  description  contains  the  properties  of  a  typical  instance 
plus  some  additional  properties  that  are  associated  with  "exceptions."  This  violates 
the  proposed  notion  of  exceptions  and  defeats  the  purpose  of  identifying  exceptions. 
Such  systems  are  not  dynamic  and  are  still  unable  to  handle  exceptions  that  fail  to 
satisfy  either  the  typical  description  or  the  extended  description  containing  "known 
exceptions." 
3.1.4   Summary 

The  new  category  theory  of  word  meaning  postulates  a  mapping  between  words 
and  categories.  Categories  are  represented  in  memory  as  complex  clusters  of  classes 
and  instances.  Case-based  reasoning  (similarity-based  comparisons)  and  explanation- 
based  learning  (cognitive  models)  are  the  processes  that  form  category  structures. 
Categories  must  be  grounded  through  some  causal  chain  involving  direct  perception 
of  physical  entities.  Any  theory  of  categorization  must  account  for  family  resemblance 
and  baisic  level  effects,  provide  prototypes  and  default  values,  and  have  the  ability  to 
handle  exceptions. 

A  formal  presentation  of  the  new  category  theory  is  given  in  the  next  chapter.  The 
remainder  of  this  chapter  examines  some  contrasting  points  of  view  and  compares 
them  with  the  new  category  theory  of  meaning. 
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3.2    Other  Related  Works 

3.2.1    Early  Analytical  Philosophers 

The  early  analytical  philosophers  have  a  general  notion  of  categories.  The  dis- 
tinction between  concept  and  object  established  by  Frege  [33]  is  fundamentally  the 
distinction  between  classes  and  instances.  A  similar  distinction  is  made  between  in- 
tension and  extension  [34].  The  intension  corresponds  to  the  sense  of  a  word.  The 
extension  is  the  set  of  things  to  which  a  word  refers.  Categories  form  a  taxonomy, 
which  is  a  subsumption  relationship  between  two  concepts.  Concept  A  subsumes  con- 
cept B  if  the  extension  of  B  is  a  subset  of  the  extension  of  A,  which  is  also  the  case 
if  the  intension  of  B  implies  the  intension  of  A.  Thus  "mammal"  subsumes  "dog." 
Frege  describes  two-level  concepts.  An  object  falls  under  a  first-level  concept;  a  con- 
cept falls  within  a  second-level  concept,  expressing  the  fact  that  a  concept  falls  under 
a  higher  one.  This  is  the  beginning  of  a  taxonomy,  a  taxonomy  which  can  have  many 
levels  (thing,  organism,  animal,  mammal,  dog.  Lassie). 

The  early  analytical  philosophers  say  very  little  about  mechanisms  for  build- 
ing categories.  They  mainly  tried  to  distinguish  basic  relationships  such  as  object- 
concept.  The  main  analysis  tool  at  the  time  was  conversion  of  sentences  into  logical 
form  [97,124].  This  is  still  a  fundamental  technique  in  natural  language  processing 
today.  The  problem  with  the  logical  form  of  sentences  is  that  it  leads  to  the  clas- 
sical view  of  categories.  The  logical  form  quickly  runs  into  trouble  in  dealing  with 
exceptions.  For  example,  to  conclude  "Joe's  bird  flies"  from  "Joe  owns  a  bird"  by 
using  logical  formulas  is  awkward  if  not  impossible.  Among  the  proposed  solution  is 
an  exhaustive  listing  of  exceptions: 

BIRD{X)  AND  NOT{PENGUIN{X)) 
AND  NOT{CHICKEN{X)) 
AND  NOT{WOUNDED{X))...  -^  FLIES{X) 
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But  the  writing  of  such  a  formula  does  not  prevent  a  new  exception  from  coming  along 

that  is  not  in  the  formula.  Furthermore,  the  meaning  of  symbols  in  the  formula  (Bird, 

Penguin,  Chicken,  and  Wounded  in  this  example)  demands  elaboration.  Often  such 

elaboration  is  not  given.  If  it  were  it  would  take  the  form  of  more  logical  formulas: 

BLACK.AND.WHITE{X)  AND  NOT{FLY{X)) 
AND  LIV  E  S. IN. ANT  ARCTIC  A{X) 
-^  PENGUIN{X) 

But  this  looks  like  a  simple  dictionary  definition  and  fails  to  capture  even  the  smallest 
portion  of  our  knowledge  associated  with  the  symbol.  Furthermore,  the  items  in  this 
definition  are  not  even  necessarily  true,  and  exceptions  are  easily  identified. 
3.2.2   Late  Wittgenstein 

Wittgenstein's  later  theory  of  word  meaning  presented  in  the  Philosophical  Inves- 
tigations [125]  argues  AGAINST  the  following  points  of  view.  These  points  parallel 
the  classical  view  of  categories.  They  can  also  be  attributed  to  Wittgenstein's  earlier 
views  as  presented  in  the  Tractatus  [124]: 

•  All  words  have  an  essential  "meaning."  A  meaning  is  treated  as  an  entity  with 
independent  existence.  A  meaning  is  attached  to  a  word.  Meanings  are  generally 
considered  to  be  mental  entities,  but  in  extreme  cases  (Plato)  are  considered 
to  be  universal  ideals.  Meanings  are  simple.  Dictionaries  proliferate  this  view 
of  meaning  by  making  us  believe  that  we  can  find  the  meaning  of  a  word  by 
looking  it  up  in  the  dictionary.  The  meaning  is  just  this  dictionary  definition. 

•  We  understand  a  word  when  we  know  its  meaning. 

•  We  are  often  sloppy  in  our  use  of  everyday  language.  For  example,  we  may  try  to 
use  a  particular  word  in  many  different  ways  which  results  in  confusion.  It  is  the 
job  of  philosophy  to  clean  up  language  usage  by  introducing  an  unambiguous. 


65 
pure,  language,  namely  logic.  We  should  express  ourselves  using  logical  form, 
as  is  suggested  in  the  Tractatus. 

•  The  essentialist  view  of  meaning  says  that  we  may  use  a  single  word  in  many 
ways,  but  all  those  ways  have  something  in  common,  namely  the  meaning. 
(Homonymity  is  actually  a  case  of  two  different  words  having  the  same  sign.) 
Thus,  the  meaning  is  an  essence  that  can  be  extracted  from  all  the  ways  the 
word  is  used.  In  the  strongest  sense,  the  essentiahst  view  says  that  we  can  state 
NECESSARY  AND  SUFFICIENT  conditions  for  using  a  word. 

3.2.2.1    Definition  of  "game" 

Wittgenstein  attacks  the  essentialist  view  by  showing  that  it  cannot  be  applied  in 
practice  to  everyday  words.  Consider  the  word  "game."  According  to  the  essentiahst 
view,  there  must  be  something  that  all  the  things  we  call  "game"  have  in  common. 
Yet  as  we  consider  more  and  more  examples  of  "game,"  we  find  less  and  less  in 
common.  The  extension  of  the  word  "game"  is  so  broad  that  there  are  no  necessary 
and  sufficient  conditions  which  can  be  given  for  identifying  games. 

Even  if  there  are  essential  elements  of  games,  they  would  be  so  minimal  that  they 
would  not  contain  much  information.  The  fact  that  all  board  games  "make  use  of  a 
board"  is  hardly  enlightening.  As  the  number  of  necessary  and  sufficient  conditions 
given  for  being  a  game  become  smaller  and  smaller,  there  is  a  danger  of  overextending 
the  definition.  People  try  to  prove  that  Wittgenstein  is  wrong  by  providing  ingenious 
definitions  for  "game,"  such  as  "GAME:  An  activity  that  is  engaging  and  diverting, 
dehberately  detached  from  real  life."  This  recent  definition  is  by  the  well-known 
artificial  intelligence  researcher  Marvin  Minsky  [79].  This  definition  overextends  the 
concept  "game"  since  it  applies  to  activities  that  are  not  games,  for  example,  watching 
a  movie.  Never  in  recorded  history  has  a  child  ever  asked  her  father,  "Daddy,  can  we 
engage  in  an  activity  which  is  diverting  and  dehberately  detached  from  real  life?" 
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The  same  argument  against  essentialism  in  semantics  applies  to  all  the  other  levels 
of  language  analysis.  For  example,  the  essentialist  argument  applied  to  syntax  would 
say  that  there  must  be  a  finite  number  of  grammar  rules  that  all  sentences  obey.  Yet, 
it  is  well  known  that  "all  grammars  leak"  [100].  For  any  fixed  set  of  grammar  rules, 
it  is  always  possible  to  find  intelligible  expressions  that  violate  these  rules. 
3.2.2.2   Family  resemblance  and  meaning  through  usage 

If  essentiahsm  fails,  then  what  is  meaning?  Here  Wittgenstein  introduces  the 
concept  of  family  resemblance.  All  games  are  not  related  by  a  fixed  conjunction  of 
necessary  and  sufficient  conditions,  but  rather  by  a  vague,  almost  intangible,  family 
resemblance.  Games  are  related  by  a  complex  of  overlapping  features,  not  one  of 
which  all  games  have  in  common.  It  is  like  the  family  in  which  the  brothers  have 
a  big  nose,  the  children  all  have  red  hair,  and  the  husbands  are  alcoholics.  And  of 
course,  a  new  child  may  be  born  (or  adopted!)  into  the  family  at  any  time  that 
introduces  new  and  possibly  different  features. 

Family  resemblance  is  the  idea  that  categories  (and  word  meanings)  are  not  created 
by  conjunctions  of  necessary  and  sufficient  conditions.  Rather  the  members  of  a 
category  are  related  by  a  vague,  often  difficult  to  describe  resemblance  in  the  same 
way  the  members  of  a  family  are  related  by  appearance. 

Another  useful  analogy  by  Wittgenstein  is  the  cities  analogy  of  word  meaning. 
When  a  new  city  begins,  just  as  a  new  word,  it  has  a  simple,  well-defined  boundary. 
As  the  city  grows,  it  expands  into  neighborhoods,  and  eventually  into  suburbs.  Very 
common  words  are  like  major  metropolitan  areas.  This  is  a  useful  picture  of  the 
structure  of  a  complex  category.  And  of  course,  new  territory  may  be  annexed  at  any 
time. 

Another  attack  on  the  classical  view  of  categories  and  word  meaning  is  summarized 
in  the  following  quotation  from  the  Brown  Book  [126]: 
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"It  is,  when  I  let  the  face  make  an  impression  on  me,  as  though  there 
existed  a  double  of  its  expression,  as  though  the  double  was  the  PROTO- 
TYPE of  the  expression  and  as  though  seeing  the  expression  of  the  face 
was  finding  the  prototype  to  which  it  corresponded-as  though  in  our  mind 
there  had  been  a  mold  and  the  picture  we  see  had  fallen  into  that  mold, 
fitting  it.  But  it  is  rather  that  we  let  the  picture  sink  into  our  mind  and 
make  a  mold  there.  [Part  II  Sec  16]" 


We  do  not  recognize  words  (or  faces)  the  way  we  recognize  a  criminal,  from  a  picture 
in  the  post  office.  We  do  not  have  fixed  molds  in  the  form  of  definitions  into  which 
the  word  can  fall.  The  word  "sinks  into  our  mind"  and  makes  a  new  mold  each  time 
the  word  is  used. 

But  if  we  do  not  have  preconceived  prototypes,  how  do  we  understand  a  word? 
Certainly  we  must  have  something  in  our  minds  that  responds  when  we  hear  or  see  a 
word.  In  the  new  category  theory,  this  function  is  provided  by  the  case-based  memory. 
What  we  do  is  remember  previous  cases  in  which  the  word  was  used,  and  try  to  apply 
them  to  the  new  case.  But  the  new  case  is  always  slightly  different  from  previous 
cases  and  does  not  fit  them  exactly.  It  creates  a  new  mold  each  time. 

For  Wittgenstein,  words  do  not  have  "meanings"  where  meanings  are  independent 
entities  we  can  describe  or  point  to.  For  Wittgenstein,  we  know  a  word  by  knowing 
a  set  of  sentences  in  which  the  word  was  used,  and  knowing  how  the  word  was  used 
in  each  sentence.  This  is  the  basis  of  the  "usage"  theory  of  word  meaning.  Thus, 
a  new  utterance  containing  a  word  only  makes  sense  to  us  if  we  can  relate  the  new 
utterance  to  previous  utterances  via  some  regularity  or  family  resemblance. 

It  was  Wittgenstein's  identification  of  family  resemblance  that  caused  the  re- 
evaluation  of  the  classical  view  of  categories.   This  idea  greatly  influenced  efforts  in 
cognitive  psychology  [96].   It  also  greatly  influenced  formation  of  the  new  category 
theory  presented  in  Section  3.1. 
3.2.2.3   Similarity-based  reasoning  and  intensional  semantics 

Intensional  semantics  is  based  on  the  notion  that  the  sense  or  meaning  of  a  word 
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can  be  defined  in  formal  terms  by  stating  a  rule  [14]  or  function  [66].  The  rule  or 
function  can  be  applied  to  determine  to  which  entities  a  word  applies.  Intensional 
semantics  is  another  form  of  essentialism,  and  therefore  has  no  basis  according  to 
Wittgenstein.  We  cannot  determine  the  extension  of  a  term  by  applying  a  rule  or 
function.  Postulating  the  existence  of  an  intension  is  like  postulating  a  meaning  as  an 
entity  according  to  the  classical  view.  For  this  reason,  Wittgenstein  never  discusses 
the  existence  of  entities  like  cognitive  models. 

A  problem  with  family  resemblance  is  that  it  is  based  entirely  on  resemblance,  that 
is,  on  measures  of  similarity.  This  assumes  that  we  can  make  similarity  judgments 
without  the  use  of  internal  criterion,  laws,  or  "prototypes."  But  what  it  means  to  be 
similar,  what  it  means  to  have  identifiable  features  that  can  be  used  as  a  basis  for 
comparison,  is  biased  by  our  beliefs.  We  do  not  observe  the  world  objectively;  our 
system  of  beliefs  influence  what  we  think  we  see  and  hear. 

This  leads  again  to  the  notion  of  "cognitive  models."  Thus,  we  believe  that  all 
bachelors  are  unmarried  males  in  spite  of  known  exceptions  (the  Pope).  And  Uncle 
Rochester,  well,  he  is  not  really  my  uncle,  just  a  good  friend  of  my  parents.  The  hedge 
really  is  made  on  the  basis  of  a  cognitive  model.  What  could  be  more  obvious  then 
the  fact  that  birds  fly?  Is  the  Pope  Catholic?  Yet  from  examining  the  extension  of 
"bird,"  we  learn  that  not  all  birds  fly.  It  is  our  cognitive  model  of  birds  that  enable 
us  to  make  defeasible  statements  like  "birds  fly." 

The  notion  of  cognitive  models  is  not  discussed  by  Wittgenstein,  rather  it  appears 
much  later  as  a  criticism  of  systems  based  entirely  on  similarity  or  family  resemblance 
[84].  Now  it  is  easy  to  confuse  these  cognitive  models  as  being  intensions  or  essentialist 
necessary  and  sufficient  conditions,  and  thus  we  have  gone  full  circle.  Cognitive 
models  are  not  absolute,  they  do  not  necessarily  define  the  extension  of  a  term; 
they  do  not  correspond  to  the  intensions  of  intensional  semantics.   Typically,  there 
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are  many  different,  often  conflicting,  cognitive  models  associated  with  a  concept. 
Furthermore,  they  are  not  a  substitute  for  similarity-based  comparisons;  rather  they 
work  in  conjunction  with  such  comparisons.  Likewise,  we  can  create  word  definitions 
and  try  to  apply  these  definitions.  But  it  is  also  a  mistake  to  think  that  any  definition 
captures  the  word's  meaning. 
3.2.3   Ziff 's  Semantic  Analysis 

Semantic  Analysis,  by  Paul  Ziff  [134],  is  particularly  relevant  to  the  theory  of 
categories  proposed  in  Section  3.1  because  of  its  orientation  to  the  analysis  of  a  corpus 
of  utterances.  Ziff  takes  sets  of  utterances  in  which  a  particular  word  occurs  as  the 
data  to  be  analyzed.  Issues  of  language  are  resolved  by  citing  examples  of  usage.  This 
corresponds  to  the  case-based  reasoning  analysis  phase  in  the  theory  of  categories. 
It  is  identical  to  the  lexical  acquisition  technique  by  which  the  various  senses  of  a 
word  are  obtained  by  analyzing  large  numbers  of  sentences  containing  the  word.  It 
contrasts  with  other  forms  of  semantic  analysis  which  rely  on  abstract,  relatively 
simple  dictionary  style  word  definitions.  The  latter  are  too  limited  to  capture  the 
diversity  of  usage,  and  are  inflexible  when  it  comes  to  understanding  usage  patterns 
that  fall  outside  the  narrow  confines  of  rules.  ZifF  begins  to  provide  an  explanation 
about  how  such  deviant  utterances,  those  which  break  the  traditional  rules  of  language 
and  which  would  be  considered  exceptions  in  category  theory,  can  still  have  meaning. 
Yet  Ziff  stops  short  of  a  full  account  of  both  "deviant"  and  "non-deviant"  utterances. 
The  new  theory  of  categories  takes  his  ideas  much  further. 
3.2.3.1    Regularity 

Languages  exhibit  regularities  at  all  levels.  The  fact  that  modifiers  usually  (but 
not  always)  precede  the  noun,  we  say  in  English  "red  apple"  and  not  "apple  red,"  is 
one  example  of  a  regularity.  Regularity  is  very  similar  to  the  notion  of  family  resem- 
blance. Just  as  Wittgenstein  argues  that  there  are  no  fixed  criterion  for  determining 
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class  membership,  ZifF  argues  that  there  are  no  rules,  only  regularities  in  language. 
The  ability  to  recognize  regularities  is  an  important  factor  in  category  theory,  and 
for  ZifF  it  is  a  very  important  factor  in  understanding  language.  That  regularities 
are  evident  at  all  levels  of  language  understanding  may  not  be  clear  at  first,  since 
we  are  used  to  thinking  in  terms  of  fixed  rules.  But  consider  the  simple  example 
of  understanding  a  person  who  speaks  the  same  language  (English)  but  with  a  very 
strong  dialect  (Southern): 

Y'all  com  ahn  dawn  fer  dinnar. 

We  understand  different  dialects  largely  because  of  the  regularities  in  phonetics 
which  allow  a  mapping  between  utterances  in  different  dialects.  It  would  be  difficult  if 
not  impossible  to  identify  rules  that  would  dictate  this  mapping.  Neither  does  it  seem 
that  we  are  born  with  such  rules,  for  it  is  not  likely  that  we  will  encounter  exactly 
the  same  pronunciation  more  than  once.  The  machinery  for  language  understanding 
must  accept  novelty  as  the  norm. 
3.2.3.2   The  problem  of  exception  handling 

As  discussed  in  Section  3.1.3.9,  there  is  a  problem  with  exception  handhng.  If  we 
encounter  a  new  and  unusual  utterance,  it  may  seem  similar  to  an  already  familiar 
utterance,  yet  it  is  unclear  whether  the  new  utterance  is  simply  a  deviation  from  the 
familiar  utterance  or  something  entirely  different.  Ziff  raises  this  problem  in  section 
72  of  Semantic  Analysis.  The  problem  is  that  an  exception  may  belong  to  an  existing 
class  or  may  represent  the  start  of  a  new  class.  On  first  encountering  an  unusual 
chair,  perhaps  a  bean-bag  chair,  we  had  to  decide  either  1)  This  is  a  chair,  but  an 
unusual  one,  or  2)  This  is  not  a  chair,  it  is  some  new  kind  of  furniture. 

It  is  not  clear  how  to  choose  between  the  two  cases.  One  could  appeal  to  agree- 
ment from  other  language  speakers  or  to  an  authority  on  language  (a  solution  which 
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ZifF  rejects).  There  have  been  various  attempts  to  resolve  the  issue  in  favor  of  cate- 
gory quality.  That  is,  put  the  new  utterance  into  the  class  that  results  in  the  most 
coherent  structure  overall.  This  is  related  to  what  makes  a  good  theory.  Perhaps  the 
best  class  structure  is  a  simple  class  structure.  Perhaps  the  best  class  structure  is 
one  that  minimizes  within-class  variation  and  maximizes  between-clsiss  variation.  It 
is  interesting  that  ZifF  recognized  this  problem,  because  it  occurs  frequently  in  the 
category  theory.  It  is  also  discussed  by  Fodor  [31]  as  the  disjunction  problem.  Some 
proposed  solutions  to  this  problem  are  discussed  further  in  Section  4.2.4. 

3.2.3.3  Conditions 

Each  utterance  is  accompanied  by  a  set  of  conditions  or  presuppositions.  These  are 
propositions  which  must  be  true  for  the  utterance  to  have  meaning,  and/or  which  set 
the  context  for  selecting  from  among  several  possible  interpretations  of  the  utterance. 
Thus,  the  utterance  also  includes  the  surrounding  context  represented  as  conditions. 

3.2.3.4  Method  of  analysis 

For  Ziff,  the  essence  of  analyzing  word  meanings  is  contained  in  three  steps.  First, 
obtain  as  complete  a  corpus  as  possible  of  utterances  containing  the  word  (called 
the  distributive  set),  and  likewise  obtain  as  large  a  set  as  possible  of  utterances 
related  to  but  contrasted  with  this  distributive  set  (called  the  contrastive  set).  The 
contrastive  set  consists  of  utterances  from  the  distributive  set  in  which  the  word  being 
analyzed  has  been  replaced  by  a  different  word,  resulting  in  a  new  utterance  that  is 
still  understandable.  For  example,  in  the  analysis  of  the  word  "tiger,"  the  utterance, 
"That  is  a  tiger,"  is  contrasted  by  the  utterance  "That  is  a  lion."  The  purpose  of 
examining  the  contrasting  utterances  is  to  determine  how  the  conditions  change,  that 
is,  how  the  conditions  differ  between  the  distributive  and  contrastive  sets. 

The  second  step  is  to  compare  and  contrast  the  distributive  and  contrastive  sets. 
This  consists  in  identifying  a  set  of  conditions  that  apply  to  an  element  of  the  distribu- 
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tive  set,  but  differ  from  the  conditions  for  corresponding  element  in  the  contrastive 
set.  This  disambiguation  is  conducted  for  each  pair  of  elements  from  the  distributive 
and  contrastive  sets. 

The  third  step  in  analysis  is  to  consolidate  the  results  of  the  analysis  over  all  ele- 
ments. This  is  where  ZifF  believes  it  is  possible  to  produce  one  or  more  generalizations 
or  hypothesis  about  the  usage  of  the  word. 

3.2.3.5  ZifF  ignores  the  category  effects 

ZifF  appears  to  be  unaware  of  or  unconcerned  with  theories  of  categorization. 
Nowhere  in  Semantic  Analysis  are  categories  or  classes  mentioned.  The  most  likely 
point  to  inject  a  theory  of  categories  would  be  in  the  third  step  of  his  meaning  analysis. 
He  compares  this  step  to  creating  a  dictionary  entry.  A  dictionary  entry,  with  multiple 
senses,  is  like  a  category  with  multiple  subcategories.  However,  dictionary  definitions 
are  only  the  tip  of  the  iceberg.  Through  the  analysis  oF  corpus,  it  is  easy  to  show 
that  standard  word  usage  has  a  diversity  which  greatly  exceeds  the  number  of  senses 
given  in  even  high  quality  dictionaries.  Without  a  theory  of  categorization,  there  is 
no  suitable  account  of  this  diversity. 

3.2.3.6  Deviant  verses  non-deviant  utterances 

It  is  remarkable  how  severely  language  can  be  corrupted  and  still  be  understand- 
able, as  in  the  following: 

Cn  u  rd  dis? 
Chile  today,  Hot  tamale. 

Does  it  make  sense  to  talk  about  an  utterance  as  being  deviant  or  non-deviant? 
For  example,  is  the  utterance  "That  cat  is  on  the  mat"  non-deviant,  and  "The  cat 
be  on  the  mat"  deviant?  Both  sentences  are  understandable.  In  certain  passages, 
ZifF  denies  that  there  can  be  any  such  distinction.    That  is,  it  is  not  possible  to 
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establish  an  authority  capable  of  judging  the  difference  between  deviant  and  non- 
deviant  utterances.  Yet  Ziff  often  tries  to  distinguish  between  deviant  and  non- 
deviant  utterances  and  concentrates  his  analysis  primarily  on  the  latter.  In  fact,  he 
intentionally  avoids  the  analysis  of  deviant  utterances: 

"Matters  become  vastly  more  complex  when  we  consider  deviant  utter- 
ances, e.g.  syntactically  deviant  utterances  such  as  "I  wrote  it  a  grief 
ago."...,  or  semantically  deviant  utterance  such  as  "He's  a  bright  fellow." 
That  frequently  a  word  has  here  one  sense,  there  another,  is  an  impor- 
tant, a  vital  fact  about  words.  I  do  not  claim  to  explain  it  fully  here.  Such 
problems  are  beyond  the  scope  of  this  essay.  [Section  187]" 

In  concentrating  on  supposedly  non-deviant  utterances,  Ziff  seems  to  over  simplify 
the  analysis  of  polysemy.  In  his  analysis  of  the  word  "good"  in  chapter  VI,  the  first  80 
utterances  containing  "good"  are  explained  by  a  single  hypothesis.  It  is  odd  that  Ziff 
would  attempt  to  simplify  things  in  such  a  way,  since  analysis  of  sets  of  utterances 
gives  the  best  evidence  of  the  great  diversity  of  ways  in  which  a  single  word  can  be 
used. 

In  the  Category  Theory,  the  machinery  is  available  to  handle  "deviant"  utterances 
in  a  very  natural  way.  Categories  are  taken  to  be  highly  diversified  groupings,  and 
word  meanings  are  treated  in  the  same  way.  For  example,  Lakoff  [61]  reports  an 
analysis  of  the  word  "over"  in  which  in  excess  of  100  different  usages  have  been 
identified.  Such  diversity  in  usage  is  not  deviant  or  at  all  unusual.  It  is  readily 
observed  in  everyday  speech.  That  highly  corrupted  language  is  still  understandable 
is  a  result  of  the  non-deductive  recognition  resulting  from  similarity  comparisons. 
3.2.3.7   How  Ziff's  analysis  is  compatible  with  categories 

That  regularities,  not  fixed  rules,  are  at  work  in  recognizing  the  similarity  between 
two  utterances  such  as: 

Mat  the  cat  the. 
Master  Castor. 
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is  crucial  to  our  ability  to  understand  both  deviant  and  non-deviant  utterances.  The 
nature  of  these  regularities  needs  to  be  studied  in  more  detail.  Ziff  only  presents  an 
informal,  intuitive  analysis  of  regularities.  The  new  category  theory  offers  a  possible 
formal  analysis  in  that  the  criterion  for  placing  things  in  categories  is  based  heavily  on 
similarity  or  family  resemblance  rather  than  fixed  rules.  General  rules  work  well  for 
non-deviant  utterances.  Perhaps  that  is  what  makes  them  seem  to  be  non-deviant. 
That  is,  rules  are  generalized  patterns  that  exist  because  a  significant  number  of 
utterances  fit  a  particular  pattern.  A  rule  should  be  interpreted  as  a  description 
rather  than  a  definition.  Thus,  the  rule  S  — >  NP,  VP  is  a  pattern  describing  a  great 
number  of  sentences,  but  it  is  not  a  rule  dictating  the  format  for  all  sentences.  There 
are  utterances  that  violate  rules  and  are  still  understandable  because  they  can  be 
mapped  directly  onto  known  expressions. 

In  the  new  category  theory,  a  formal  mapping  between  two  instances  (two  utter- 
ances) can  be  defined.  This  effect  is  easiest  to  see  in  the  way  analogy  and  metaphor 
are  handled.  In  the  analogy,  "An  atom  is  like  the  solar  system,"  there  is  a  structural, 
syntactic  mapping  between  the  symbols  representing  the  nucleus  and  the  sun,  the 
electrons  and  the  planets,  and  between  electron  spins  and  orbits.  Such  mappings  can 
be  created  at  all  levels  of  language  understanding  (phonetic,  morphologic,  syntac- 
tic, semantic,  and  pragmatic).  It  is  a  fundamental  service  provided  by  a  system  of 
categories. 

According  to  the  new  category  theory,  we  understand  a  new  utterance  because 
of  its  relationship  to  previous  utterances.  When  we  hear  a  new  utterance,  we  try  to 
remember  previous  utterances  that  are  somehow  similar  to  the  new  utterance,  and 
then  map  our  understanding  of  the  previous  utterances  to  the  new  utterance.  For 
example,  consider  this  metaphor: 

"In  1960,  the  small  town  of  Orlando  was  cocooned  by  citrus  groves." 
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Now  I  have  never  heard  the  word  "cocoon"  used  as  a  verb,  much  less  apart  from 
the  role  it  plays  in  insect  development.  Yet  I  can  understand  this  sentence.  I  can 
do  so  because  of  the  grammatical  regularities,  namely,  because  of  the  "-ed"  ending 
on  "cocooned,"  and  also  by  its  position  in  the  sentence,  which  suggests  that  it  is  a 
verb.  More  correctly,  this  sentence  resembles  other  sentences  that  have  a  verb  in  the 
position  occupied  by  "cocoon."  I  can  also  understand  Orlando  being  cocooned  by 
a  citrus  grove  by  a  mapping  from  what  I  already  know  about  cocoons,  namely  that 
they  act  as  a  protective  covering  for  an  insect  pupa.  In  this  case,  the  town  maps  to 
the  insect,  and  the  citrus  groves  maps  to  the  protective  covering.  Thus,  Orlando  was 
surrounded  by  citrus  groves  as  if  the  groves  were  offering  the  protection  of  a  cocoon. 

The  role  of  categories  is  to  cluster  sets  of  previous  utterances  into  groups  of  sim- 
ilarity. Why  do  we  need  categories?  In  order  to  make  inferences  about  things  we 
are  not  told.  Thus,  I  was  never  told  that  "cocoon"  could  be  used  as  a  verb.  Yet 
by  recognizing  the  sentence  above  as  belonging  to  a  class  of  passive  sentences  (based 
only  on  similarity,  (is) {verb  —  ED))^  I  could  immediately  infer  a  number  of  proper- 
ties. For  example,  the  object  of  the  by-prepositional  phrase  is  the  subject  and  agent 
of  the  sentence.  By  association  with  previous  utterances  of  "cocoon,"  I  could  infer 
the  relationship  between  orange  groves  and  Orlando.  We  use  categories  to  antici- 
pate the  properties  of  an  entity  based  on  only  partial  information  about  the  entity. 
Such  inference  is,  of  course,  subject  to  exceptions,  but  is  consistent  with  the  available 
evidence. 

Note  also  that  intensional  semantics  would  have  failed  to  interpret  this  sentence. 
Any  intension  would  have  been  fixed  to  the  "meaning"  of  cocoon  in  its  role  as  insect 
pupa.  It  could  not  have  anticipated  this  exceptional  use.  In  general,  intensional 
semantics  cannot  deal  with  exceptions.  Yet  language  is  dynamic.  New  and  unusual 
usage  patterns  are  encountered  frequently  in  everyday  language. 
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3.2.4   Connectionist  Models  of  Computation 

Very  recently  there  has  been  a  great  deal  of  interest  in  connectionist  models,  in- 
cluding massively  parallel  processing  and  neural  networks.  Proponents  claim  that 
connectionist  models  are  alternatives  to  traditional  symbolic  logic-based  approaches 
to  language  understanding  and  intelligence  in  general.  Philosophers  such  as  Dreyfus 
[18],  and  psychologists  such  as  Lakoff  [61]  are  sharply  critical  of  the  symbolic  ap- 
proach. Does  the  connectionist  model  offer  a  real  alternative  to  representing  word 
meaning?  Fodor  and  Pylyshyn  [32]  have  argued  extensively  against  the  idea  that 
connectionist  models  offer  anything  new.  In  this  section,  the  possible  relevance  of  the 
connectionist  approach  to  language  understanding  will  be  briefly  examined. 

It  is  important  to  separate  two  subclasses  of  connectionist  models.  One  subclass, 
represented  by  systems  such  as  NETL  [22],  implements  traditional  semantic  networks 
on  parallel  processors.  One  symbol,  complex  object,  or  node  in  the  semantic  net- 
work is  assigned  to  each  processing  element.  The  symbols  are  processed  in  parallel 
rather  than  serially  as  is  done  using  traditional  computers,  but  the  result  is  the  same. 
Thus,  this  subclass  of  connectionist  models  does  not  offer  any  real  representational 
improvements. 

The  second  class,  neural  networks,  is  more  interesting.  Neural  networks  consist 
of  processing  elements  which  model  real  neurons  in  that  the  state  of  each  element 
is  determined  by  numerous  inputs  from  other  elements.  Inputs  are  added  together 
according  to  weights  which  may  be  inhibitory  or  excitory.  A  neuron  "fires"  if  the 
inputs  sum  to  a  certain  threshold.  Neural  networks  do  not  contain  symbols  in  the 
traditional  computing  approach.  Rather,  the  network's  behavior  is  determined  by 
weights  and  voltage  levels  distributed  over  large  numbers  of  processing  elements. 

In  order  to  understand  how  neural  networks  can  be  used  in  natural  language 
processing,  it  is  necessary  to  study  some  current  research.  McClelland  and  Kawamoto 
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[71]  trained  a  neural  network  to  convert  a  natural  language  syntactic  input  into  a 
semantic  output  consisting  of  attribute-value  pairs.  The  input  was  already  coded 
into  a  syntactic  notation  (with  attributes  such  as  subject,  object,  and  prepositional 
phrases  already  marked),  and  the  output  was  in  a  conceptual  dependency  notation 
(actor,  action,  instrument).  In  order  to  train  a  neural  network,  sample  patterns 
are  placed  on  the  input.  The  network  is  stimulated  to  adjust  connection  weights 
until  the  output  reaches  a  desired  state,  namely  a  state  corresponding  to  a  semantic 
pattern  that  matches  the  input.  After  training,  novel  input  will  (hopefully)  produce 
the  correct  output.  Even  if  the  training  works,  there  is  one  major  objection  to  this 
approach.  It  assumes  that  language  is  represented  using  attribute-value  pairs  and 
conceptual  dependencies.  This  is  precisely  the  symbolic  approach  to  which  neural 
networks  are  supposed  to  provide  an  alternative.  McClelland  and  Kawamoto  have 
at  best  succeeded  in  building  a  very  fast  parser,  but  have  failed  to  offer  new  insights 
into  language  understanding. 

Another  similar  approach  by  Dyer  [19]  called  symbol  recirculation  trains  the  net- 
work with  input  and  output  patterns,  but  does  not  assign  attribute-value  pairs.  The 
input  consists  of  words,  but  rather  than  assign  features  to  words,  each  word  is  ini- 
tially a  random  bit  pattern.  As  the  word  is  used  in  more  and  more  contexts,  the 
bit  pattern  is  altered  until  eventually  a  stable  set  of  values  exists.  After  training, 
words  with  similar  usage,  such  as  cheese/pasta  or  hatchet/hammer,  have  similar  bit 
patterns. 

Two  claims  were  made  about  this  symbol  recirculation  approach.  First, 

"...the  representation  of  each  word  carries  a  memory  trace  of  all  the  con- 
texts of  use  that  serve  to  define  it." 


Second, 


"...we  want  learning  systems  in  which  statistical  associations  are  formed 
automatically." 
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These  are  precisely  the  sort  of  claims  that  would  be  made  about  the  approach  to 
representing  word  meaning  which  was  presented  in  Section  3.1,  and  that  is  a  symbolic 
approach.  Word  meaning  is  obtained  by  gathering  (or  remembering)  and  analyzing 
all  the  situations  in  which  the  word  is  used.  Default  reasoning  is  a  sort  of  statistical 
association,  and  the  approach  lends  itself  to  automatic  lexical  acquisition.  Thus,  it 
is  not  necessary  to  have  a  neural  network  to  obtain  these  desirable  characteristics. 

At  this  time  it  is  unclear  just  how  the  connectionist  approach  offers  an  alternative 
to  the  symbolic  approach.  It  may  be  that  the  approach  outlined  in  Section  3.1  will 
provide  a  formal  basis  for  what  is  going  on  in  neural  networks.  It  may  be  (and  we 
should  hope)  that  the  algorithms  needed  for  language  understanding  will  run  faster 
on  a  connectionist  machine,  but  that  is  an  implementation  issue  and  does  not  impact 
the  approach.  What  is  needed  is  a  theory  of  word  meaning.  That  theory  should  be 
developed  independently  of  the  type  of  computer  hardware  on  which  natural  language 
processing  systems  will  ultimately  run. 


CHAPTER  4 
TOWARDS  DATABASE  SCHEMA  GENERATION  THROUGH  CONCEPTUAL 

CLUSTERING 

4.1    Introduction 


The  theories  of  categorization  introduced  in  the  previous  chapter  will  now  be 
incorporated  into  a  formal  model  for  conceptual  clustering.  A  conceptual  clustering 
algorithm  is  presented  which  assists  in  building  and  maintaining  a  database  schema. 
The  resulting  database  structure  is  a  more  accurate  representation  of  categories  than 
was  provided  by  the  terminological  knowledge  representation  system  presented  in 
Chapter  2. 

The  algorithm  can  automatically  build  a  database  schema  by  clustering  together 
groups  of  instances  and  classes.  A  class  structure  evolves  as  a  tradeoff  between 
similarity-based  comparisons  (case-based  reasoning)  and  cognitive  models  of  class 
structure  (explanation-based  learning).  Conflicts  between  similarity  and  model- based 
comparisons  are  created  by  exceptions.  The  algorithm  includes  procedures  for  iden- 
tifying and  handling  exceptions.  The  resulting  classes  exhibit  the  family  resemblance 
effect.  Methods  for  generating  default  values  and  prototypes  are  presented. 

Cognitive  models  are  represented  by  class  descriptions  which  are  entered  manually. 
These  correspond  to  DEFINED  classes  in  CANDIDE.  Though  many  other  structures 
for  representing  cognitive  models  are  possible  (Chapter  6),  the  discussion  will  be 
limited  to  class  descriptions. 

An  instance  may  be  assigned  to  a  class  either  because  it  meets  a  class  description 
or  because  it  is  similar  to  other  instances  in  that  class.  The  Subsumption  function 
(Section  2.3)  is  used  to  compare  two  class  descriptions.  The  Realization  function  is 
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used  to  determine  whether  an  instance  meets  an  existing  class  description.  A  new 
function,  INTERSECT,  is  introduced  to  compare  the  similarity  of  two  instances.  IN- 
TERSECT is  used  in  defining  an  exception  condition.  Exception  handling  is  necessary 
if  the  schema  is  to  evolve  to  respond  to  the  changing  requirements. 
The  ideas  in  this  chapter  can  be  applied  as  follows: 

•  They  provide  a  more  accurate  notion  of  a  class  in  database  modeUng  by  incor- 
porating theories  of  categorization. 

•  They  improve  the  prospects  for  semi-automated  schema  design  by  learning  from 
observation  of  database  instances. 

•  They  extend  the  capabilities  of  class-based  models  by  allowing  better  exception 
handhng,  default  reasoning,  and  analogical  reasoning. 

The  conceptual  clustering  algorithm  is  presented  in  the  next  section  including  IN- 
TERSECT, exception  handling,  schema  evolution,  and  generation  of  default  values 
and  prototypes.  Related  work  in  conceptual  clustering  is  discussed  in  Section  4.3. 

4.2   Conceptual  Clustering  Algorithm 

The  purpose  of  the  clustering  algorithm  is  to  assign  instances  to  a  class.  In 
the  process,  existing  classes  may  be  modified  (schema  evoluation)  and  new  classes 
formed.  The  process  is  incremental  in  that  each  new  instance  or  class  is  being  added 
to  an  existing  database.  The  structure  (schema)  of  the  database  must  be  altered  to 
account  for  the  new  instance  or  class.  The  process  is  conceptual  in  that  it  is  based 
on  a  comparison  of  the  structures  of  database  objects  which  represent  concepts. 

The  main  components  of  the  clustering  algorithm  are  outlined  in  Figure  4.1. 
Cognitive  models  and  explanation-based  learning  are  used  in  steps  1,  2.1.,  and  2.4. 
Case-based  reasoning  is  used  in  steps  2.2.  and  2.3.  The  following  sections  describe  the 


81 


1.  Introduce  a  New  Class 

1.1.  Use  SUBSUME  and  Classification  to  determine  the  relationship 
between  the  new  class  and  existing  class  descriptions. 

1.2.  Use  COMPLIES  and  Realization  to  determine  which  existing 
instances  satisfy  the  new  class  description. 

2.  Introduce  a  New  Instance 

2.1.  Use  COMPLIES  and  Realization  to  place  the  new  instance  into 
classes  for  which  the  instance  satisfies  class  descriptions. 

2.2.  Use  INTERSECT  to  identify  other  related  instances.  This  may 
generate  new  classes,  but  is  also  needed  in  the  next  step. 

2.3.  Use  the  Exception  Condition  to  see  if  the  new  instance  may 
be  an  exception  to  an  existing  class  description. 

2.4.  Based  on  a  decision  to  place  an  exception  condition  into 
a  class,  use  EVOLVE  to  modify  the  class  schema. 

Figure  4.1:  Main  Components  of  the  Conceptual  Clustering  Algorithm 

data  model  used  to  build  instances  and  class  descriptions,  the  insertion  of  a  new  class, 
the  insertion  of  a  new  instance  including  INTERSECT,  the  Exception  Condition,  and 
EVOLVE,  and  the  relationship  to  default  values  and  prototypes. 

4.2.1   Data  Model 

The  CANDIDE  data  model  is  used  for  building  instance  and  class  descriptions. 
CANDIDE  was  described  in  detail  in  Section  2.2.  Each  CANDIDE  object  has  a  name 
which  is  a  unique  object  identification.  In  addition,  there  is  a  lexicon  of  linguistic 
expressions  that  contains  a  mapping  from  each  expression  to  one  or  more  objects  in 
the  database.  That  is,  natural  language  expressions  can  be  associated  with  an  object. 
This  association  is  dynamic.  Whereas  the  object  identification  never  changes,  the 
association  from  words  to  objects  can  change  frequently.  The  mapping  from  words 
to  objects  is  many-to-many. 

Example  1:  Here  a  few  classes,  {Student, Professor},  and  some  instances,  {John, 
Joe,  Mary,  Jim,  Fred,  Sally}  from  a  University  database.  The  database  also  includes 
some  other  primitive  classes  such  as  Person,  Man,  Woman,  Rtype,  Location,  Course, 
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and  Department: 


Student 


Professor 


DEFINED 

SUPERCLASSES:  Person 
SUBCLASSES:  Graduate  Student 
INSTANCES:  Jim,Fred,SaUy 
ATTRIBUTE  RESTRICTIONS 

Major:  ATLEAST  1  CLASS  Department 
Courses:  ALL  CLASS  Course 
Age:  EXACTLY  1  RANGE(  18,22) 
Advisor:  ATLEAST  1  CLASS  Professor 
Residence:  COMPOSITE 

Type:  EXACTLY  1 

CLASS  Rtype 
LOCATION:  EXACTLY  1 


DEFINED 

SUPERCLASSES:  Person 
SUBCLASSES:  Assistant  Professor, 
Associate  Professor, 
Full  Professor 
INSTANCES:  John,Joe,Mary 
ATTRIBUTE  RESTRICTIONS 
Title:  EXACTLY  1  Professor 
Teaches:  ATLEAST  1  CLASS  Course 
Advises:  ATLEAST  1  CLASS  Student 
Department:  ATLEAST  1  CLASS 

Department 
Salary:  EXACTLY  1  INTEGER 


CLASS  Location 


Jim 

PARENTS:  Student, Man 
ATTRIBUTES 

Major:  Computer  Engineering 
Courses:  Data  Structures,  Pascal, 

Calculus 
Age:  10 
Advisor:  Jim 
Residence:  COMPOSITE 

Type:  Dormatory 
Location:  CoUegeville 


John 

PARENTS:  Professor,Man 
ATTRIBUTES 

Title:  Full  Professor 

Department:  Computer  Engineering 

Teaches:  Databases,  Algorithms 

Advises:  Jim 

Salary:  70,000 


Fred 
PARENTS:  Student,Man 
ATTRIBUTES 

Major:  Electrical  Engineering 

Courses:  Probability,Physics 

Age:  19 

Advisor:  Joe 

Residence:  COMPOSITE 

Type:  Apartment 
Location:  CoUegeville 


Joe 


PARENTS:  Professor, Man 
ATTRIBUTES 

Title:  Associate  Professor 

Department:  Electrical  Engineering 

Teaches:  Network  Analysis, 

Semiconductors,  Control 

Advises:  Fred,  Sally 

Salary:  40,000 
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Sally 


Mary 


PARENTS:  Student,Woman 
ATTRIBUTES 

Major:  Electrical  Engineering 
Courses:  Network  Analysis, 

Instrumentation 
Age:  21 

Advisor,  Joe,  Mary 
Residence:  COMPOSITE 

Type:  Apartment 
Location:  Collegeville 


PARENTS:  Professor,Woman 
ATTRIBUTES 

Title:  Assistant  Professor 

Department:  Computer  Engineering 

Teaches:  Pascal,  Data  Structures 

Advises:  Sally 

Salary:  35,000 


4.2.2   Inserting  a  New  Class 

Subsumption  and  Classification  automatically  insert  a  new  class  into  an  existing 
database.  The  use  of  these  procedures  in  CANDIDE  was  discussed  at  greater  length 
in  Section  2.3.  Class  Cj  subsumes  class  C2  if  every  instance  of  C2  is  also  an  instance  of 
C\.  Subsumption  can  be  determined  by  the  class  descriptions.  That  is,  Ci  subsumes 
C2  if  every  superclass  of  C\  subsumes  C2  and  the  attribute  restrictions  of  C2  imply  the 
attribute  restrictions  of  C\.  Thus,  the  subsumption  relationship  can  be  determined 
for  any  two  classes,  C\  and  C2  by  using  the  SUBSUME  function: 

SUBSUME(Ci,C2)  =  {TRUE,FALSE} 


which  is  TRUE  if  Ci  subsumes  C2  and  FALSE  otherwise.  In  other  words,  Ci  sub- 
sumes C2  if  C2  satisfies  the  necessary  and  sufficient  conditions  specified  in  the  class 
description  of  C\ . 

Classification  is  a  procedure  that  uses  SUBSUME  to  place  a  new  class  into  the 
existing  database.  The  classifier  finds  the  most  specific  classes  subsuming  the  new 
class  and  the  most  general  classes  subsumed  by  the  new  class.  In  effect,  the  new  class 
is  compared  with  each  existing  class  using  a  search  procedure  to  limit  the  number 
of  candidate  classes  which  need  to  be  checked.    Once  the  position  of  the  new  class 
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has  been  determined,  existing  instances  can  be  added  to  the  class  using  a  function 

described  in  the  next  section. 

This  procedure  for  inserting  a  new  class  applies  only  if  Ci  is  a  DEFINED  class. 
In  the  past,  when  Ci  was  a  PRIMITIVE  class  SUBSUME  had  to  be  determined 
manually.  In  the  next  section,  it  is  shown  how  case-based  reasoning  can  be  used  to 
deal  with  PRIMITIVE  classes  in  a  more  automated  fashion. 
4.2.3   Inserting  a  New  Instance 

4.2.3.1  Realization  and  COMPLIES 

The  first  step  in  inserting  a  new  instance  is  to  identify  existing  class  descriptions 
that  are  satisfied  by  the  new  instance.  Realization  and  COMPLIES  are  analogous 
functions  to  Classification  and  SUBSUME.  The  function  COMPLIES  is  a  test  between 
an  instance  /  and  class  C: 

COMPLIES(C,/)  =  {TRUE,FALSE} 

/  is  an  instance  of  C  if  /  is  an  instance  of  all  superclasses  of  C,  and  the  attributes 
of  /  satisfy  the  attribute  restrictions  of  C.  That  is,  /  must  satisfy  the  necessary  and 
sufficient  conditions  given  in  the  class  description  of  C  In  Example  1,  the  instance 
Jim  satisfies  all  the  attribute  restrictions  in  class  Student,  thus  COMPLIES(Student, 
Jim)=TRUE.  Realization  is  a  searching  procedure  over  the  class  taxonomy  that  effec- 
tively appHes  COMPLIES  to  each  class  in  order  to  identify  the  most  specific  classes 
satisfied  by  /. 

4.2.3.2  Intersection  preprocessing 

Classification  and  Realization  build  taxonomies  based  on  necessary  and  sufficient 
conditions  specified  in  class  descriptions.  These  functions  provide  the  main  cognitive 
model  and  explanation-based  learning  portion  of  the  clustering  algorithm.  Inter- 
section is  a  comparison  between  two  instances.    Intersection,  a  form  of  case-based 
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reasoning,  provides  the  capability  needed  to  satisfy  the  other  category  formation 
principles. 

The  purpose  of  INTERSECT  is  to  determine  whether  two  instances  have  anything 
in  common,  and  if  so  what  it  is.  INTERSECT  is  a  function  that  takes  two  instances, 
II  and  12,  and  produces  a  class  description  C: 

INTERSECT(/i,/2)  =  C 

C  is  created  from  the  components  of  I\  and  I2  which  are  the  same.  C  is  the  minimal 
class  description  satisfied  by  both  I\  and  l2-  INTERSECT  is  used  as  a  basis  of 
similarity.  C  specifies  the  ways  in  which  /i  and  I2  are  similar. 

To  define  INTERSECT,  it  is  convenient  to  view  each  instance  as  a  connected, 
possibly  cyclic,  directed  graph  having  a  root  node.  INTERSECT  can  then  be  de- 
fined as  a  matching  between  two  graphs.  Simply  stated,  the  Immediate  Description 
Graph  of  an  instance  is  just  those  components  of  the  graph  which  are  identified  in 
the  instance  object.  The  Extended  Description  Graph  is  obtained  by  recursively  ex- 
panding the  other  database  objects  contained  within  the  instance  object  into  their 
Immediate  Description  Graphs.  The  Extended  Description  Graph  of  an  instance  in- 
cludes all  objects  of  the  database  which  are  connected  with  the  instance.  A  simple 
transformation  enables  an  instance  to  be  converted  to  an  Immediate  and  Extended 
Description  Graph.  Figure  4.2  gives  the  Immediate  and  Extended  Description  Graph 
for  an  instance  from  Example  1. 
Procedure  IDG:  Immediate  Description  Graph  [IDG)  of  Instance  / 

The  IDG  of  /  is  a  directed  graph  IDG  —  {V,  £",/?}  with  nodes  V,  arcs  £J, 
and  a  root  R.  V  can  be  partitioned  into  {i?,  V^,  V^,  K}  where  V^  is  a  set  of  nodes 
corresponding  to  the  PARENTS  of  /,  V^  is  a  set  of  value  set  nodes,  and  K  is  a  set  of 
value  nodes.   E  can  be  partitioned  into  {Ep,  Ea,  E^]  where  Ep  is  a  set  of  PARENT 
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PARENT 


Figure  4.2:  The  Extended  Description  Graph  and  Immediate  Description  Graph  (In 
Bold)  for  Instance  Jim 
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arcs,  Ea  a  set  of  ATTRIBUTE  arcs,  and  E^  a  set  of  VALUE  arcs.  IDG  is  constructed 
as  follows: 

1.  Construct  a  root  node  R  which  contains  the  INSTANCE  NAME. 

2.  Create  PARENT  arcs.  One  PARENT  arc  in  Ep  going  out  from  R  is  created 
for  each  of  /'s  PARENTS.  Each  PARENT  arc  is  connected  to  a  node  in  Vp 
containing  the  CLASS  NAME  for  that  parent. 

3.  Create  ATTRIBUTE  arcs.  One  arc  in  Ea  going  out  from  R  is  created  for 
each  of  /'s  ATTRIBUTES  labeled  by  the  ATTRIBUTE  NAME.  Since  each 
attribute  may  have  more  than  one  value,  a  graphical  representation  is  necessary 
for  describing  the  attribute  value  set.  This  is  accomplished  by  connecting  each 
attribute  arc  to  a  node  in  V,  called  a  value  set  node.  The  value  set  node  has  an 
outgoing  arc  in  Ey  labeled  VALUE  for  each  value  of  the  attribute.  Each  VALUE 
arc  is  connected  to  a  value  node  in  Vy  which  is  the  root  of  another  directed  graph 
representing  the  value.  Thus,  an  attribute's  value  set  is  represented  by  a  set  of 
value  nodes  in  V,,.  These  nodes  are  connected  to  the  attribute's  value  set  node. 


'  V 


4.  Create  graphs  for  each  attribute  value.  A  value  of  type  STRING,  INTEGER,  or 
REAL  is  stored  in  a  single  terminal  node  in  Vy.  Even  though  RANGE  contains 
an  internal  structure  (minimum  and  maximum  value),  it  is  not  expanded  further 
and  is  stored  in  a  single  terminal  node  in  Vy.  A  single  node  in  Vy  containing  the 
INSTANCE  NAME  is  created  for  a  value  of  type  INSTANCE.  A  value  of  type 
COMPOSITE  is  expanded  into  an  IDG.  The  VALUE  arc  is  connected  to  the 
root  of  this  IDG.  A  value  of  type  CLASS  is  treated  by  replacing  the  class  with 
all  of  its  instances.  The  single  VALUE  arc  for  this  class  is  removed  and  a  new 
VALUE  arc  is  created  for  each  instance  of  the  class.  This  VALUE  arc  is  rooted 
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in  the  value  set  node  and  points  to  a  node  in  K  containing  the  INSTANCE 
NAME. 

Procedure  EDG:  Extended  Description  Graph  {EDG)  of  Instance  / 

The  EDG  of  /  is  a  graph  EDG  =   {V,E,R].  V  and  E  can  be  partitioned  as  in 
the  IDG.  EDG  is  constructed  as  follows: 

1.  Start  with  the  IDG  of  /.  The  root  of  the  EDG  is  the  root  of  the  IDG.  V  and 
E  contain  the  nodes  and  arcs  of  the  IDG. 

2.  Add  ancestor  classes.  Consider  the  class  generalization  hierarchy  as  a  directed 
graph  H  —  {C,S,T]  where  C  is  the  set  of  nodes  containing  one  node  for 
each  class  in  the  database  (the  node  contains  the  CLASS  NAME),  S  is  the  set 
of  directed  arcs  containing  one  arc  from  each  class  node  in  C  to  each  of  its 
inunediate  SUPERCLASSES,  and  T  is  the  root  of  the  generalization  hierarchy 
(Thing).  Add  node  c  from  C  to  the  EDG  of  /  if  there  is  a  path  in  H  connecting 
node  p  to  node  c  where  p  is  a  parent  class  of  /  (p  is  already  contained  in  the 
IDG  of  /).  Add  arc  s  from  S  to  the  EDG  if  s  is  contained  within  the  path 
from  at  least  one  parent  of  /  to  some  node  in  C. 

These  ancestor  nodes  are  contained  in  Vp,  and  the  arcs  are  contained  in  Ep.  Add 
additional  arcs  to  the  EDG  so  that  the  arcs  in  Ep  are  closed  under  transitivity. 
That  is,  if  (ci,C2)  and  (02,03)  are  arcs  in  Ep,  then  (01,03)  must  also  be  in  the  Ep. 

3.  Recursively  expand  each  instance  node  within  any  attribute  value  set  in  the 
IDG  into  an  EDG. 
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4.2.3.3   Definition  of  INTERSECT 
Definition:  INTERSECT(/i,/2) 

Given  the  Expanded  Description  Graphs  EDGi  =  {Vi,Ei,Ri}  and  EDG2  = 
{V2,E2,R2}  of  two  instances  /i  and  h,  INTERS  EOT  ( A, /j)  is  obtained  by  creating 
a  new  graph  EDGc  =  {Vc-iEcRc]  which  is  the  graph  intersection  of  EDGi  and 
EDG2.  EDGc  can  then  be  converted  into  a  class  desription. 

Consider: 

S  =  {V,xV2,EtxE2,{Ri,R2)} 
Then  INTERSECT(/i,/2)=  EDGc  if  there  is  a  function  F  which  maps  EDGc  into 
S  such  that  a  component  of  EDGc  is  mapped  to  a  pair  of  matched  components  from 
EDGi  and  EDG2.  The  mapping  F  is  defined  as  follows: 

F  :  Re   -^  (i?i,i?2) 

F  :  Vpc  -^  Vpi  X  Vp2 
F  :  Vsc  ->  Vsi  X  ^52 
F  :  Vvc  -*  Vvi  X  Vv2 

F  :  Epc  ->  Epi  X  Ep2 
F  :   Eac  -*   Eai  x  Ea2 

F  :  Evc  -*  Evi  x  Ev2 

1.  Roots.  The  root  of  EDGc  maps  to  this  matched  pair  of  roots  of  EDGi  and 
EDG2. 

Example  2:  Figure  4.3  illustrates  the  mapping  function  F  which  maps  the 
root  of  EDGc  to  the  matched  pair  of  roots  from  the  Extended  Description 
Graphs  for  the  two  instances. 

2.  Ancestors.  There  is  an  ancestor  node  and  arc  in  EDGc  for  each  ancestor 
node  and  arc  occurring  in  both  EDGi  and  EDG2.  Transitive  closure  of  Epi 
and  Ep2  guarantee  that  there  is  an  arc  from  the  root  node  of  EDGc  to  each  of 
the  most  specific  ancestor  nodes  in  EDGc- 
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Figure  4.3:  F  Mapping  the  Root  of  EDGc  to  the  Matched  Pair  of  Roots  from  EDGi 
and  EDG2 


IF  c  e  Vpc  k 

ci  e  Vpi  Sz 

CLASS  NAME(c)  =  CLASS  NAME(ci) 
3  a  path  from  R\  to  C\  in  EDG\  k. 
3  a  path  from  R2  to  C2  in  EDG2 
THENF(c)  =  (ci,C2) 


CLASS  NAME(c2)  & 


Here  CLASS  NAME(c)  is  the  name  of  the  class  associated  with  ancestor  node 
c.  For  the  arcs: 

IF  a  =   {d,  c")  e  Epc  k 

02  =  (4)4')  G  Ep2  k 
F{c')  =  F{d,,d,)k 
F{c")  =  F{di,d^ 
THEN  F{a)  =  (01,02) 


Example  3:  Suppose  instance  I\  has  ancestors  Vpi  =  {Student,  Woman, 
Person,  Animal,  Thing},  and  instance  I2  has  ancestors  Vp2  =  {Child,  Person, 
Animal,  Thing}.  Then  the  intersection  of  the  ancestors  of  these  two  instances 
is  Vpc  =  {Person, Animal,Thing}.  The  mapping  function  maps  an  ancestor 
node  in  EDGc  to  matched  pairs  of  ancestor  nodes  in  EDGi  and  EDG2,  for 
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example  F(Person)  =  (Person, Person).  An  example  for  arcs,  F((i?c,Person)) 
=  ((/?i, Person), i?2,Person)). 

3.  Attributes.  There  is  an  arc  in  Eac  for  each  pair  of  arcs  from  Eai  and  Ea2 
which  are  labeled  by  related  attributes.  Label  this  arc  in  Eac  using  the  most 
specific  common  attribute.  Two  attributes  are  related  if  they  have  a  common 
ancestor  in  the  attribute  hierarchy  other  than  Thing.  The  most  specific  common 
attribute  is  the  most  specific  such  ancestor. 

IF  a  G  Eac  & 
al  e  Eai  k 
a2  €  Ea2  Sz 
3  an  attribute  t  : 
{L{a)  =  t 
ANCESTOR(<,L(ai))  & 
ANCEST0R(^,L(a2))  & 
{Vy(ANCEST0R(2/,Z(ai))  &  ANCEST0R(j/,L(a2)))  -* 

ANCESTOR(2/,0)}  & 
t  /  Thing  } 
THENF(a)  =  (01,02) 

Here,  ANCEST0R(fi,^2)  is  TRUE  if  attribute  h  is  above  attribute  <2  in  the 
attribute  hierarchy.  L{a)  is  the  attribute  associated  with  attribute  arc  a. 

Example  4:  If  the  attributes  of  /i  are  Eai  =  {Major,  Course,  Age,  Advisor, 
Residence}  and  the  attributes  of  I2  are  Ea2  —  {Title,  Department,  Teaches, 
Advises,  Salary}  then  Eac  =  {Class}  where  Class  is  above  Course  and  Teach 
in  the  attribute  hierarchy.  F(Class)  =  (Course,Teach). 

4.  Attribute  value  sets.  An  arc  a  in  Eac  maps  to  a  pair  (01,02)  from  Eai  and 
£"02-  The  arc  a  is  associated  with  a  value  set  resulting  from  the  intersection  of 
the  value  sets  associated  with  Oi  and  02-  First,  the  arc  a  is  associated  with  a 
value  set  node: 
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IF  u  e  Vsc  & 

Vi  G  Vsi  Sz 

V2  €  ^52  Sz 

a  =   {Rc,v)  & 

d    =    (i?l,Vl)  & 
02    =    {R2,V2)  & 

F(a)  =   (ai,a2) 
THENF(t;)  =  (t;i,t;2) 

It  is  necessary  to  consider  the  intersection  of  every  possible  pair  of  values  from 

the  value  sets  of  Oi  and  cj.   This  is  called  the  Value  Set  Intersection  (VSI).  If 

VS(<)  is  the  value  set  associated  with  an  attribute  t  (VS(f)  is  a  subset  of  K), 

then  VSI  is  defined  as: 

VSI(ai,a2)  =  {{v,vi,V2)  |  3  vi  €  VS(ai)  k  3  V2  E  VS(a2)  & 

VlNTERSECT(t;i,u2)  =  v  k 
V  ^  NULL  &  {v,vi,V2)  is  distinct} 

VINTERSECT,  the  intersection  of  two  value  nodes,  is  defined  in  5.  An  element 
{v,Vi,V2)  is  distinct  if  there  is  not  another  element  {v\v[,v'2)  in  which  v  is  in  the 
domain  of  v',  but  where  ui  =  v[  or  ^2  =  ^2-  That  is,  if  two  elements  contain  the 
same  value  for  v  or  the  value  of  v  in  one  element  is  a  special  case  of  the  value  in 
the  other,  then  they  must  be  formed  from  two  entirely  different  pairs  of  value 
nodes.  This  eliminates  including  the  same  node  too  many  times. 

Function  F  maps  value  nodes  in  Vvc  to  a  pair  of  nodes  from  Vvi  and  Vv2  if 
the  value  node  corresponds  to  the  intersection  of  the  pair: 

IF  t;  6  VS(a)  & 

F{a)  =   (01,02)  k 

(u,wi,U2)  G  VSI(ai,a2) 
THEN  F{v)  =  {vi,V2) 

Finally,  VALUE  arcs  emanating  from  a  value  set  node  in  Evc  map  to  corre- 
sponding pairs  of  VALUE  arcs  emanating  from  the  value  set  nodes  of  Evi  and 
Ev2: 
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IF  a  =  (vs^vv)  e  Evc  & 
d  =  (vsi^wi)  G  Evi  & 

F(vv)    =    {vVi,VV2) 

THENF(a)  =  (01,02) 


Example  5:  If  attributes  oi  and  02  are  both  called  Course,  and  the  following 
relationships  hold: 

VS(oi)  =  {Data  Structures, Pascal,Calculus} 
VS(o2)  =  {Probability,Physics} 

VINTERSECT(Data  Structures,Probability)  =  Course 
VINTERSECT(Data  Structures,Physics)  =  Course 
VINTERSECT(Pascal,Probability)  =  Course 
VINTERSECT(Pascal,Physics)  =  Course 
VINTERSECT(Calculus,Probability)  =  Math  Course 
VINTERSECT(Calculus,Physics)  =  Course 
SUBSUME(Course,Math  Course)  =  TRUE 

then  ¥81(01,02)  =  {Course,  Math  Course}. 

F(Math  Course)  =  (Calculus, Probability). 

5.  Intersection  of  value  nodes.  The  intersection  of  two  value  nodes, 

VINTERSECT(ui,t;2)  depends  on  their  types.  There  are  several  possible  com- 
binations of  types  (not  all  are  mentioned  here): 

•  Two  atomic  values  (INTEGER,  REAL,  or  STRING)  have  a  non-NULL 
intersection  only  when  they  are  the  same  type.  If  they  are  also  the  same 
value,  than  the  intersection  is  the  value.  Otherwise,  if  they  are  the  same 
type  then  the  intersection  is  the  type.  The  intersection  of  two  values  of 
different  atomic  types  is  NULL. 

•  Two  values  of  type  RANGE  have  a  non-NULL  intersection  only  if  their 
ranges  overlap.  Thus  the  intersection  of  RANGE(/i,/ii)  with  RANGE(/2,/i2) 
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is  RANGE(/2,/ii)  if  h  <  k  <  hi  and  k  <  hi  <  /12.  The  intersection  of 
RANGE(/,/i)  with  the  integer  n  is  n  if  /  <  n  <  A. 

•  The  intersection  of  two  values  of  type  INSTANCE  or  COMPOSITE  is  the 
INTERSECT  of  the  values. 

Example  6:  VINTERSECT(1,1)  =  1,  VINTERSECT(1,2)  =  INTEGER, 
VINTERSECT(1,1.05)  =  NULL,  VINTERSECT(RANGE(1,10),RANGE(7,15)) 
=  RANGE(7,10). 

4.2.3.4   Converting  EDGp  to  a  class  description 

Procedure  CLASS:  Convert  EDGc  into  a  Class  Description 

An  EDGc  returned  by  the  function  INTERSECT(/i,/2)  can  be  converted  into  a 
class  description  C  as  follows: 

1.  All  classes  created  by  non-NULL  intersection  of  instances  are  DEFINED  classes. 

2.  Remove  all  but  the  most  specific  ancestor  classes.  For  each  pair  of  classes 
(ci,C2)  in  EDGc,  remove  Ci  and  any  arcs  connected  to  ci  if  ci  subsumes  C2. 
Any  ancestor  class  remaining  that  is  on  an  arc  from  the  root  of  EDGc  becomes 
an  element  of  the  SUPERCLASSES  of  C. 

3.  Form  an  attribute  restriction  in  C  for  each  attribute  arc  from  the  root  node 
of  EDGc-  ATLEAST,  EXACTLY,  and  ALL  restrictions  are  created  ba^ed  on 
the  elements  of  the  value  set  for  the  attribute.  Recall  that  the  value  set  of  an 
attribute  in  EDGc  is  created  from  the  Value  Set  Intersection  of  attributes  Ci 
and  a-2  from  instances  /i  and  I^. 

(a)  If  the  value  set  of  the  attribute  in  EDGc  has  rii  values  (ui),  and  n2  values 
(^2),  and  Tik  values  {vk)  where  vi  is  in  the  domain  of  V2,  V2  is  in  the 


95 
domain  of  1*3, .. .  ,UA:-i  is  in  the  domain  of  Vk,  then  create  an  ATLEAST 
"1  +  "2  +  •••  +  '^fc  {vk)  restriction. 

(b)  If  an  ATLEAST  n  {v)  restriction  has  been  created,  and  a^  and  02  also  have 
exactly  n  values  each,  change  the  ATLEAST  n  {v)  to  EXACTLY  n  {v). 

(c)  If  an  ATLEAST  n  (v)  restriction  has  been  created,  and  all  the  values  of 
ai  and  02  are  in  the  domain  of  (v),  but  Ci  and  02  do  not  have  the  same 
number  of  values,  add  an  ALL  {v) 

4.  C  must  be  classified  to  correctly  determine  all  of  its  SUPERCLASSES,  SUB- 
CLASSES, and  INSTANCES. 

Example  7:  The  attribute  restriction  resulting  from  the  attribute  value  intersection 
in  Example  5  is: 

Course:  ATLEAST  1  Math  Course,  ATLEAST  2  Course,  ALL  Course. 

This  requires  that  the  attribute  have  at  least  one  Math  Course  and  at  least  one  other 
Course.  Any  other  attribute  value  must  also  be  an  instance  of  Course. 
Example  8:  Recall  the  instances  Jim  and  Fred  from  Example  1.   The  intersection 
between  these  two  instances  is: 
INTERSECT(Fred,Jim)  = 

New_Class_l 

SUPERCLASSES:  Student,Man 
ATTRIBUTE  RESTRICTIONS 

Major:  EXACTLY  1  CLASS  Engineering 
Courses:  ATLEAST  1  CLASS  Math  Course 
ATLEAST  2  CLASS  Course 
ALL  CLASS  Course 
Age:  EXACTLY  1  INSTANCE  19 
Advisor:  COMPOSITE 

SUPERCLASS:  Professor,  Man 
ATTRIBUTE  RESTRICTIONS 
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Title:  EXACTLY  1  Professor 
Department:  EXACTLY  1  CLASS  Engineering 
Teaches:  ATLEAST  1  CLASS  Course 
Advises:  ATLEAST  1  CLASS  Student 
Salary:  EXACTLY  1  INTEGER 
Residence:  COMPOSITE 

ATTRIBUTE  RESTRICTIONS 

Type:  EXACTLY  1  CLASS  Rental  Unit 
Location:  EXACTLY  1  INSTANCE  CollegeviUe 

The  intersection  between  the  advisors,  INTERSECT(John,Joe),  is  shown  as  a  COM- 
POSITE class  description.  The  Advises  attribute  leads  to  an  infinite  cycle  which  had 
to  be  truncated.  Classification  leads  to  the  conclusion  that  the  new  class  is  below 
class  Student  in  the  generalization  hierarchy. 

4.2.3.5  NULL  and  TRIVIAL  intersection 

INTERSECT(/i,/2)  =  C  is  NULL  if  C  =  Thing.   This  happens  when  h  and  h 
have  nothing  in  common. 

INTERSECT(/i,/2)  =  C  is  TRIVIAL  if  C  already  exists  as  a  class  in  the  database. 

4.2.3.6  INTERSECT  of  more  than  two  instances 

INTERSECT  can  be  defined  over  a  set  of  instances: 

INTERSECT(/i,  l2,...,In)  =  C 

Such  an  INTERSECT  can  be  created  by  recombining  the  EDGc  created  from  IN- 
TERSECT of  two  EDGs  with  additional  EDGs: 

INTERSECT(/i,  /2,  ...,/„)  = 

INTERSECT(. .  .(INTERSECT(INTERSECT(7i,/2),/3),/4).  •  •  h) 

Example  9:  Recall  the  instance  Sally  from  Example  1.  The  intersection  over  three 
instances  is: 
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INTERSECT(Jim,Fred,Sally)  = 

New_Class_2 

SUPERCLASSES:  Student 
ATTRIBUTE  RESTRICTIONS 

Major:  EXACTLY  1  CLASS  Engineering 
Courses:  ATLEAST  2  CLASS  Course 

ALL  CLASS  Course 
Age:  EXACTLY  1  INTEGER 
Advisor:  COMPOSITE 

SUPERCLASS:  Professor 
ATTRIBUTE  RESTRICTIONS 
Title:  EXACTLY  1  Professor 
Department:  EXACTLY  1  CLASS  Engineering 
Teaches:  ATLEAST  1  CLASS  Course 
Advises:  ATLEAST  1  CLASS  Student 
Salary:  EXACTLY  1  INTEGER 
Residence:  COMPOSITE 

ATTRIBUTE  RESTRICTIONS 

Type:  EXACTLY  1  CLASS  Rental  Unit 
Location:  EXACTLY  1  INSTANCE  CollegeviUe 

Notice  that  New_Class_2  is  more  general  than  New_Class_l.  The  greater  the  number 
of  instances,  the  less  detail  in  the  class  resulting  from  the  intersection. 
4.2.3.7   Truncation  of  redundant  paths  and  cycles 

The  class  object  C  contains  only  a  portion  of  the  complete  graph  expansion 
EDGc  ■  Graph  expansion  is  truncated  at  certain  nodes  because  of  redundant  paths 
or  cycles.  This  will  happen  in  three  situations: 

1.  Common  Ancestors.  7i  and  I2  may  have  many  common  ancestors.  Only  the 
most  specific  ancestors  are  recorded  in  C.  The  others  are  implied  by  SUPER- 
CLASS links  in  the  generalization  hierarchy. 

2.  Common  Attribute  Values.  Intersection  of  attribute  values  may  lead  to  inter- 
section of  two  instances,  I\  =  I2  =  /,  which  are  the  same.  Continuing  to  follow 
the  graph  expansion  along  two  identical  instance  values  leads  to  an  unnecessar- 
ily lengthy  path.  By  truncating  the  expansion  at  the  point  where  two  instances 
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are  the  same,  this  expansion  is  avoided,  yet  is  also  made  implicit  by  recording 

the  value  of  /  in  C 

3.  Inverse  Links  (Cycle).  An  attribute  value  may  refer  back  directly  or  indirectly 
to  an  instance  containing  that  attribute  value  via  an  inverse  link.  This  may 
lead  to  a  cycle.  It  appears  that  such  a  cycle  is  both  possible  and  meaningful. 

4.2.3.8  CONTAINS 

Consider  the  situation  of  one  instance  being  contained  within  another.  That  is,  I^ 
may  appear  as  a  value  of  some  attribute  of  /i.  In  such  a  case,  INTERSECT(/i,/2)  is 
typically  NULL  or  TRIVIAL.  For  example,  suppose  /i  is  Room  and  I2  is  a  Chair  that 
is  located  inside  of  Room.  Then  INTERSECT(Room,Chair)  is  TRIVIAL  (they  are 
both  Physical  Entities).  The  fact  that  one  is  contained  inside  the  other  is  certainly 
a  strong  relationship,  but  it  is  not  involved  in  determining  how  the  two  entities  are 
similar. 

It  might  be  desirable  to  develop  a  C0NTAINS(/i,/2)  function  which  is  TRUE  if 
/i  contains  I2.  Metonymic  relationships  [61],  that  is,  a  reference  to  an  object  made  by 
referring  to  a  part  of  the  object,  could  be  discovered  in  such  a  way.  Though  this  may 
have  some  useful  purpose,  it  does  not  appear  to  be  needed  in  a  clustering  algorithm. 

4.2.3.9  Intersection  over  all  instances  in  the  database 

When  inserting  a  new  instance  /  into  the  database,  it  is  necessary  to  compute 
INTERSECT  between  /  and  every  other  instance  in  the  database.  Since  only  a  small 
percentage  of  the  instances  in  the  database  would  have  a  non-NULL  intersection  with 
/,  the  task  is  simplified  by  first  identifying  a  candidate  set  of  instances.  The  candidate 
set  should  be  as  small  as  possible  without  excluding  any  instances  that  would  have  a 
non-NULL  intersection. 

One  such  candidate  set  is  created  by  first  finding  the  set  of  all  attributes  related 
to  the  attributes  A  of  /: 
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AC(7)  =  Uva.e^  {ai,  PARENTS(a.),  CHILDREN(a.)} 

PARENTS (oi)  and  CHILDREN(a,)  are  the  attributes  that  are  above  and  below  a^  in 
the  attribute  hierarchy.  If  instances  in  the  database  are  indexed  by  attributes,  it  is 
possible  to  quickly  retrieve  all  instances  containing  at  least  one  attribute  in  AC.  This 
set  of  instances  is  included  in  the  candidate  set.  In  addition,  the  new  instance  I  is 
realized  to  find  its  most  specific  parents.  The  instances  of  these  parents  must  also  be 

added  to  the  candidate  set: 

CANDIDATE(/)  =  {i\3aj  eAC(/)  &  instance  i  has  an  attribute  Oj}  U 
{i\3Ck.C  is  a  parent  of  i  &  C  is  a  parent  of  7} 

Every  intersection  between  I  and  an  instance  from  CANDIDATE  is  non-NULL  since 
there  would  be  at  least  one  related  attribute  or  one  common  parent. 
4.2.4   Exception  Condition 

In  this  section,  it  is  shown  how  a  new  instance  might  be  placed  in  a  class  even  if  it 
fails  to  satisfy  the  necessary  and  sufficient  conditions  for  class  membership.  The  new 
instance  may  be  uniquely  similar  to  other  instances  in  this  class.  That  is,  it  bears  a 
resemblance  to  instances  in  the  class  that  is  not  shared  by  any  other  instances  outside 
the  class.  Similarity  is  determined  by  INTERSECT.  The  Exception  Condition  is  a 
condition  that  can  be  evaluated  whenever  a  new  instance  is  inserted. 
Test:  Exception  Condition 

When  a  new  instance  I  is  inserted  into  the  database,  compute  INTERSECT 
between  I  and  all  other  instances  in  the  database.  The  Exception  Condition  is  raised 
if: 

1.  I  fails  to  meet  the  necessary  and  sufficient  conditions  for  class  membership 
specified  in  the  class  description  of  an  existing  class  C. 

2.  A  new  non-trivial  class  C  results  from  the  INTERSECT  of  I  with  other  existing 
instances  such  that: 
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(a)  EXTENSION(C)  fl  EXTENSION(C")  7^  NULL 

(b)  EXTENSION(C')  -  EXTENSION(C)  =  {/} 

where  EXTENSION (c)  is  the  set  of  instances  that  are  members  of  class  c. 

Conditions  1  and  2  are  necessary  conditions  for  /  to  be  an  exception  to  class  C. 

Notice  it  is  not  required  that  the  initial  description  of  I  includes  C  as  a  PARENT. 
Condition  2a  insures  that  /  has  some  similarity  with  other  instances  of  C.   Condi- 
tion 2b  insures  that  /  is  "uniquely  similar,"  that  is,  that  no  other  instances  besides 
instances  of  C  share  in  this  similarity. 
Example  10:  Consider  a  new  instance: 

Reggie 

PARENT:  Man 
ATTRIBUTES 

Department:  Computer  Engineering 

Title:  Full  Professor 

Research  Area:  Databases 

Salary:  80,000 

Reggie  (a  professor  that  does  not  teach)  does  not  conform  to  the  existing  class  de- 
scription for  either  Student  or  Professor.  The  CANDIDATE  set  for  intersection  is 
{John,  Joe,  Mary}.  The  Exception  Condition  holds  where  C  is  Professor  and  C  is: 

C 

SUPERCLASS:  Person 
ATTRIBUTE  RESTRICTIONS 

Department:  EXACT  1  Department 

Title:  EXACT  1  Professor 

Salary:  EXACT  1 

The  Exception  Condition  is  a  necessary  but  not  sufficient  condition  for  I  to  be  an 
exception  to  C.  This  is  because  two  situations  may  apply: 
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I.  /  is  an  exception  and  therefore  an  instance  of  C.  The  structure  of  C  must  be 
modified  to  account  for  /  by  forming  a  new  class  C„  which  subsumes  C  and  C". 
The  new  class  description  is  a  weakened  form  of  C. 

II.  I  is  not  an  exception  to  class  C,  but  represents  the  beginning  of  a  new  class. 
In  this  case  the  structure  of  C  is  not  altered. 

The  Exception  Condition  cannot  distinguish  between  these  two  cases. 

There  are  several  alternative  strategies  for  distinguishing  between  Case  I  and  II. 
At  worst,  the  user  or  database  administrator  will  be  required  to  make  a  decision. 
There  are  many  situations  in  which  this  is  the  only  alternative,  the  result  being  that 
a  fully-automatic  clustering  algorithm  is  not  possible.  On  the  other  hand,  completely 
excluding  human  intervention  as  is  done  in  many  clustering  algorithms  seems  unre- 
alistic. There  is  evidence  showing  the  importance  of  cultural  influence  on  category 
formation  [74],  which  in  the  present  case  amounts  to  a  subjective  bias. 

A  related  technique  is  error  correction  through  dialogue.  In  this  approach  the 
system  automatically  assumes  Case  I.  If  this  assumption  is  incorrect,  the  error  will 
eventually  appear  when  the  system  incorrectly  uses  terms  associated  with  class  C„ 
during  a  dialogue  with  the  user.  The  user  can  then  correct  the  erroneous  usage.  In 
exactly  the  same  way,  it  is  conceivable  that  the  user  could  apply  a  term  incorrectly 
and  be  corrected  by  the  system. 

Explanation-based  reasoning  can  be  applied  to  justify  or  "explain"  a  choice  be- 
tween Case  I  and  II.  Other  cognitive  models  besides  class  descriptions  may  be  used 
to  explain  the  observed  data.  Borgida  discusses  this  approach  in  [7].  The  clustering 
procedure  would  benefit  from  including  other  types  of  cognitive  models  discussed  in 
Chapter  6. 

Selection  of  Case  I  requires  alteration  in  the  existing  class  structure.  Certain 
classes  are  resistant  to  change.  For  example,  it  is  not  possible  to  change  a  logically 
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necessary  and  sufficient  condition.  Classes  at  the  basic  level  cannot  be  altered  with 
regards  to  shape  or  function. 

Some  clustering  algorithms  use  a  measure  of  category  quality  to  make  a  decision 
between  Case  I  and  II  automatically  [25,43,64,114].  This  measure  judges  the  resulting 
quality  of  the  class  structure  when  Case  I  is  selected  versus  the  quality  when  Case  II 
is  selected  and  selects  the  structure  which  is  best.  These  measures  often  depend  on 
various  parameters  and  thresholds.  Thus,  they  can  be  somewhat  subjective.  They 
require  that  a  judgement  be  made  about  what  makes  a  good  category.  The  clustering 
algorithm  of  Section  4.2  does  not  attempt  to  judge  the  quality  of  class  structure. 

Finally,  the  hypothesis  generation  approach  tentatively  selects  both  Case  I  and  II. 
The  Exception  Condition  results  in  two  hypotheses  being  formed.  Thus,  rather  than 
making  permanent  changes,  the  changes  resulting  from  selecting  I  or  II  are  marked 
as  hypothetical.  The  system  can  weight  one  or  the  other  hypothesis  when  additional 
evidence  becomes  available. 

4.2.5   EVOLVE  Procedure 

If  a  decision  is  made  to  accept  Case  I,  it  is  then  necessary  to  alter  the  class 
structure  to  accommodate  the  exception.    The  details  of  how  to  accommodate  the 
new  instance  is  a  matter  of  schema  evolution;  a  variety  of  options  exist.  A  minimal 
requirement  is  described  by  EVOLVE: 
Procedure:  EVOLVE 

If  instance  /  is  to  be  accepted  as  an  exception  to  class  C,  then  a  new  class  C„ 
must  be  created  such  that: 

1.  Cn  =  INTERSECT(7,  EXTENSION(C)) 

2.  C„  may  be  NULL 
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3.  C„  necessarily  subsumes  C  and  C. 

4.  Linguistic  expressions  mapped  to  C  by  the  lexicon  must  now  map  to  C„. 

The  importance  of  four  is  that  the  terms  appHed  to  C  now  apply  to  C„  and,  thus, 
also  to  /.  The  new  class  C„  represents  the  entity  type  that  had  previously  been 
represented  by  C.  The  old  class  C  retains  its  structure  except  that  it  is  now  a 
subclass  of  C„.  Class  C  still  has  the  same  extension  and  is  still  useful  as  a  description 
of  this  extension.  I  is  not  an  instance  of  C  but  of  C„.  Notice  also  that  if  another 
instance  is  added  that  is  exactly  like  /,  it  will  NOT  be  an  exception  since  it  will  fit 
naturally  into  the  existing  classes  C„  and  C". 

Notice  that  C„  is  necessarily  simpler  than  C  (C„  may  even  be  NULL).  It  contains 
fewer  conditions  for  class  membership  than  does  the  original  C.  This  is  a  result 
of  the  family  resemblance  effect.  As  a  class  acquires  more  instances,  there  is  less 
and  less  that  all  the  instances  have  in  common.  On  the  other  hand,  the  subclasses 
associated  with  C„  are  much  richer  than  for  C.  As  a  class  evolves,  it  is  subdivided 
into  regions  of  similar  instances.  The  result  is  a  complex  cluster  of  subclasses  and 
instances  characterized  by  localized  areas  of  strong  similarity,  yet  over  the  entire  class 
the  instances  may  only  have  a  weak  similarity. 

Example  11:  The  Exception  Condition  was  raised  in  Example  10  between  a  new 
instance  Reggie  and  an  existing  class  Professor.  If  a  decision  is  made  to  include 
Reggie  into  the  class  Professor,  the  Professor  class  must  evolve.  The  class  description 
C"  given  in  Example  10  meets  the  criteria  for  C^  in  the  EVOLVE  procedure.  C 
becomes  the  new  Professor  class,  and  the  old  Professor  class  is  mapped  to  a  new 
description,  such  as  "Teaching  Professor." 
4.2.6   Default  Values  and  Prototypes 

As  stated  in  Section  3.1.3.2,  one  of  the  criterion  for  category  structure  is  the  ability 
to  provide  default  values  and  prototypes.  These  can  be  computed  directly  from  the 
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database  by  reasoning  over  a  set  of  instances.  That  is,  prototypes  and  defaults  are 
like  statistical  summaries  over  a  specific  set  of  instances.  This  contrasts  with  the 
more  common  approaches  such  as  [45]  in  which  prototypes  and  default  values  are 
specified  directly  by  the  system  designer  and  stored  within  a  class  object.  Whereas  in 
these  systems,  defaults  are  values  to  be  inherited  by  instances  from  their  ancestors;  in 
the  approach  presented  here,  prototypes  and  defaults  are  generated  from  the  bottom 
up.  The  class  descriptions  in  CANDIDE  would  not  be  considered  prototypes;  such 
descriptions  are  true  of  all  members  of  the  class.  The  class  description  would  be 
included  as  part  of  the  prototype. 

In  the  approach  presented  here,  defaults  and  prototypes  are  always  created  within 
a  context.  A  context  is  a  specified  set  of  instances.  The  defaults  and  prototypes  are 
generated  relative  to  this  set  of  instances. 
Procedure:  Generation  of  a  Prototype 

In  order  to  generate  a  prototype  instance,  Ip,  of  a  class,  it  is  necessary  to: 

1.  Define  the  context  C  by  creating  a  view,  specifying  a  query,  or  by  specifying  an 
existing  class. 

2.  Obtain  the  set  of  instances  satisfying  C.  For  example,  if  a  class  is  specified,  all 
the  instances  of  this  class  are  considered  in  this  set. 

3.  For  the  set  of  instances  satisfying  context  C: 

(a)  Determine  Typical  Parents.  A  class  is  included  in  the  prototype  if  it  is 
an  ancestor  of  more  than  half  of  the  instances.  Include  only  most  specific 
ancestors. 

(b)  Determine  Typical  Attributes.  An  attribute  is  included  in  the  prototype 
if  it  is  an  attribute  or  the  ancestor  of  an  attribute  in  more  than  half  of  the 
instances.  Include  only  the  most  specific  attributes. 
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(c)  Determine  Typical  Attribute  Values.  For  each  attribute  in  the  prototype, 
include  a  value  in  the  attribute  if  it  is  a  value  for  more  than  half  of  the 
instances. 

It  is  not  necessary  that  there  be  a  real  instance  which  corresponds  to  /p. 
Procedure:  Generation  of  a  Default  Value 

A  default  value  is  computed  exactly  like  a  prototype,  except  that  in  addition  to 
specifying  the  context,  an  attribute  A  must  also  be  specified. 

1.  Define  a  C  and  obtain  a  set  of  instances  satisfying  the  context. 

2.  Include  a  value  in  the  default  value  set  for  A  if  the  value  is  present  for  attribute 
A  (or  a  descendent)  in  more  than  half  of  the  instances  within  context  C. 

In  use,  a  default  value  can  be  appHed  to  an  instance  which  is  known  to  have  an 

attribute  but  is  missing  the  attribute  values. 

Example  12:  In  the  new  Professor  class,  the  prototype  is  given  by: 

Typical-Professor 
PARENTS:  Man 
ATTRIBUTES 

Department:  Computer  Engineering 

Title:  Full  Professor 

Teaches: 

Advises: 

Salary: 

The  default  value  for  Residence  among  instances  of  Student  is: 

COMPOSITE 

Type:  Apartment 
Location:  CoUegeville 
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4.3   Related  Work 

The  earliest  clustering  algorithms  were  based  on  a  numerical  techniques  [99].  Sim- 
ilarity is  defined  as  a  distance  metric  within  an  n-dimensional  attribute  space.  Two 
concepts  are  similar  if  they  are  close  together  in  this  space.  A  threshold  value  is 
used  to  identify  the  members  of  a  class;  that  is,  concepts  within  a  specified  distance 
of  each  other  are  put  into  the  same  class.  This  approach  has  severe  problems  both 
in  defining  the  distance  metric  and  defining  the  threshold.  For  example,  a  concept 
may  be  placed  inside  or  outside  a  class  depending  on  a  small  change  in  the  thresh- 
old. The  conceptual  clustering  algorithm  presented  in  Section  4.2  is  not  based  on 
any  distance  metric  and  does  not  depend  on  arbitrary  parameters  such  as  thresholds. 
Furthermore,  numerical  clustering  algorithms  do  not  exploit  the  semantics  of  objects 
in  forming  classes;  that  is,  they  are  not  conceptual  clustering  algorithms. 

Winston  uses  a  function  very  similar  to  INTERSECT  in  his  classic  program  for 
learning  the  concept  of  "arch"  [123].  Positive  and  negative  examples  of  arches  are 
represented  as  semantic  networks.  Two  networks  are  compared  for  similarity  in  terms 
of  their  structure.  However,  Winston's  approach  is  Learning  By  Example  which 
presupposes  that  the  system  is  told  that  an  example  does  or  does  not  belong  to 
a  specified  class  (with  an  emphasis  on  "near  misses").  Thus,  it  is  not  a  conceptual 
clustering  algorithm.  Furthermore,  Winston  violates  the  family  resemblance  principle 
when  he  assumes  that  a  single  specialized  rule  can  be  created  that  can  describe  the 
concept  "arch". 

John  Sowa  provides  an  operator  called  "Join"  as  a  fundamental  operation  that 
can  be  applied  to  conceptual  graphs  [112].  Two  graphs  are  joined  by  combining 
nodes  and  arcs  from  each  graph  that  are  either  identical  or  related  through  a  type 
taxonomy.  A  "maximal  join"  is  a  graph  which  contains  the  common  subgraphs  of 
two  related  graphs.  Thus,  a  maximal  join  resembles  the  class  which  results  from  the 
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INTERSECT  of  two  instances  in  the  clustering  algorithm.  Although  these  may  be 
generic  graph  operations,  Sowa  does  not  present  any  exphcit  clustering  algorithm  for 
conceptual  graphs  that  use  these  operations.  The  theory  of  conceptual  graphs  does 
not  incorporate  a  learning  or  classification  algorithm.  As  in  many  other  representa- 
tion systems,  conceptual  graph  theory  places  minimal  emphasis  on  reasoning  about 
instances.  For  example,  the  issue  of  exception  handling  is  not  addressed. 

The  first  conceptual  clustering  algorithm  was  CLUSTER  developed  by  Stepp  [114]. 
It  used  Michalski's  INDUCE/2  algorithm  to  generate  class  descriptions  from  a  set  of 
instances.  Although  INDUCE/2  uses  Learning  By  Example,  it  is  apphed  in  such 
a  way  that  CLUSTER/2  could  operate  without  a  tutor  providing  examples  of  class 
members.  Thus,  CLUSTER/2  is  a  form  of  Learning  by  Observation.  CLUSTER/2 
is  entirely  automatic.  It  uses  a  set  of  rules  called  the  Lexical  Evaluation  Function 
(LEF)  to  determine  category  quality.  Class  descriptions  that  fail  to  meet  the  quality 
control  are  rejected. 

CLUSTER/S  incorporated  a  form  of  Explanation-based  Learning.  Domain  knowl- 
edge is  represented  in  a  Goal-Dependency  Network  (GDN).  The  GDN  is  used  to  derive 
attributes  related  to  a  specified  goal.  For  example,  if  the  goal  is  "Survive,"  then  a 
concept  such  as  "Eat  Food"  is  related  to  the  main  goal  via  the  GDN.  Categories  can 
thus  be  formed  that  correspond  to  important  attributes  identified  by  the  GDN. 

Class  formation  in  CLUSTER/S  corresponds  to  queries  in  CANDIDE.  Given  a 
goal,  a  new  class  object  can  be  built  containing  attributes  directly  and  indirectly 
related  to  the  goal.  Through  realization,  instances  are  identified  that  belong  to  this 
new  class.  This  is  an  example  of  specifying  a  cognitive  model  and  finding  instances 
that  conform  to  the  model. 

Thus,  CLUSTER/S  incorporates  cognitive  modeling  but  does  not  include  case- 
based  learning.  (Although  CLUSTER/S  also  uses  CLUSTER/2  to  create  subclasses 
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after  the  initial  class  is  formed,  the  creation  of  these  subclasses  is  irrelevant  to  the 
original  goal).  This  is  the  opposite  of  CLUSTER/2  which  forms  classes  entirely  on  the 
basis  of  instance  similarity.  The  class  descriptions  in  CLUSTER/2  are  built  accord- 
ing to  a  view  which  again  violates  the  family  resemblance  principle.  The  clustering 
algorithm  presented  in  Section  4.2  avoids  using  any  measures  of  category  quality.  The 
rules  in  the  LEF  are  arbitrary  and  difficult  to  justify,  yet  they  are  crucial  for  making 
CLUSTER/2  fully  automatic. 

Borgida  presents  an  algorithm  for  modifying  database  schemas  to  accommodate 
exceptions  [7].  It  is  one  of  the  few  applications  of  machine  learning  techniques  to 
database  management.  Unfortunately,  the  algorithm  requires  that  the  class  to  which 
the  exception  belongs  be  identified  by  the  user.  Thus,  it  is  a  trivial  form  of  exception 
handhng  (see  Section  3.1.3.9).  Borgida  also  explores  an  interesting  use  of  explanation- 
based  learning  to  explain  why  an  exception  deviates  from  the  class  description.  This 
approach  is  apphed  to  modification  of  integrity  constraints. 

Another  major  class  of  incremental  clustering  algorithms  is  based  on  discrimina- 
tion networks.  These  include  EPAM  [23],  CYRUS  [56],  and  UNIMEM  [64].  Discrim- 
ination networks  sort  indexes  into  a  taxonomy.  The  algorithm  is  top-down.  When 
a  new  instance  is  inserted,  a  search  is  made  down  each  level  of  the  taxonomy.  At 
each  node  a  decision  must  be  made  about  which  of  the  node's  children  most  closely 
matches  the  instance.  Each  node  contains  a  description  similar  to  a  class  descrip- 
tion, and  the  new  instance  is  compared  to  this  description.  This  process  is  similar  to 
inserting  an  instance  through  Realization  (see  Section  4.2.3.1). 

Since  the  approach  is  strictly  top-down,  it  is  possible  to  miss  a  related  instance. 
The  Intersection  algorithm  in  Section  4.2.3.6  avoids  this  problem  by  using  a  global 
search.  Ironically,  discrimination  networks  are  used  extensively  in  case-based  reason- 
ing as  a  way  of  indexing  cases,  yet  they  may  fail  to  find  related  instances. 
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Discrimination  networks  are  based  entirely  on  similarity  comparisons  and  do  not 
incorporate  cognitive  models.  The  resulting  classes  are  therefore  hard  to  interpret  and 
do  not  correspond  to  natural  categories  or  entity  types  as  would  be  needed  in  database 
applications.  Work  is  beginning  to  appear  in  using  the  results  of  discrimination 
networks  as  a  basis  for  building  cognitive  models  [63],  but  this  is  very  preliminary. 

Finally,  a  number  of  conceptual  clustering  algorithms  take  a  probabihty  approach. 
COBWEB  [25]  and  a  similar,  earlier  program  called  WITT  [43],  represent  instances  as 
attribute-value  pairs  as  in  the  other  approaches.  COBWEB  uses  a  statistical  measure 
of  category  utility  to  evaluate  the  quality  of  different  clustering  patterns  over  the  same 
set  of  instances.  This  category  utility  measure  [38]  is  designed  to  maximize  the  ability 
to  predict  instance  attributes  from  knowledge  of  class  membership.  COBWEB  does 
an  excellent  job  of  predicting  the  quantitative  results  of  psychological  experiments  in 
which  subjects  are  asked  to  sort  various  items  into  categories  [96].  However,  these 
experiments  are  conducted  using  mainly  "nonsense"  items  (sorting  abstract  shapes  or 
sorting  strings  of  apparently  random  letter  sequences),  and  in  such  situations  statis- 
tical similarity  is  likely  to  prevail.  COBWEB  does  not  incorporate  cognitive  models 
and  thus  would  have  difficulty  with  many  natural  categories  which  are  determined 
mostly  by  models  rather  than  similarity.  There  has  been  some  preliminary  work  on 
using  COBWEB  to  classify  cognitive  models  [128,127]. 

4.4   Conclusions 

An  algorithm  has  been  presented  for  conceptual  clustering  that  partially  auto- 
mates the  process  of  generating  a  database  schema.  The  INTERSECT  function 
generates  a  class  description  from  the  common  components  of  instances.  The  al- 
gorithm incorporates  a  number  of  features  from  category  theory.  In  accordance  with 
the  family  resemblance  effect,  the  algorithm  can  generate  a  class  with  instances  that 
have  little  or  nothing  in  common.  The  algorithm  can  identify  exceptions,  and  modify 


no 

the  database  schema  to  accommodate  an  exception.  The  algorithm  incorporates  a 
tradeoff  between  cognitive  models  and  similarity-based  clustering.  This  is  accom- 
plished by  combining  explanation-based  and  case-based  reasoning.  The  result  is  a 
data  model  with  a  more  realistic  treatment  of  categories,  and  a  better  match  between 
the  resulting  database  schema  and  application  domain. 

Although  the  algorithm  is  semi-automatic  rather  than  fully-automatic,  it  will  act 
as  a  helpful  assistant  to  a  database  designer.  There  is  reason  to  suppose  that  a  fully- 
automated  algorithm  is  not  possible.  The  algorithm  will  begin  to  work  on  a  database 
that  has  been  constructed  initially  by  hand.  As  the  database  develops  and  the  schema 
becomes  more  complex,  the  algorithm  will  be  able  to  make  better  inferences. 

The  approach  leads  to  new  database  inferences  such  as  generation  of  prototypes 
and  default  values.  In  addition,  it  is  beheved  that  these  techniques  can  lead  to 
more  expressive  queries.  For  example,  it  will  be  possible  to  retrieve  instances  that 
not  only  match  the  query  exactly,  but  that  are  related  to  the  query  through  an 
indirect  similarity.  This  means  that  the  system  could  support  analogical  queries. 
The  approach  is  fully  compatible  with  case-based  reasoning,  and  thus  could  retrieve 
cases  directly  or  indirectly  related  to  the  query. 

Errors  in  data  entry,  that  is,  entries  which  are  in  some  sense  unusual,  would 
be  trapped  by  the  Exception  Condition.  This  includes  typographical  errors,  and 
numerical  values  outside  the  range  normally  associated  with  the  context.  The  system 
becomes  more  sensitive  to  errors  as  it  is  populated  by  more  data. 

The  computational  complexity  of  the  algorithm  has  not  been  studied  formally, 
although  there  has  been  extensive  work  on  SUBSUME  [83].  It  appears  that  SUB- 
SUME is  intractable  in  the  worst  Ccises  for  all  but  the  most  simplified  data  models. 
INTERSECT  appears  to  be  much  more  complex  than  SUBSUME.  However,  SUB- 
SUME and  CLASSIFY  have  already  been  implemented  in  a  main  memory  database, 
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and  for  typical  cases  the  computations  are  performed  within  acceptable  time  limits. 
Though  implementation  of  INTERSECT  is  not  yet  complete,  it  is  expected  that  IN- 
TERSECT will  also  perform  well  on  real  data,  and  performance  will  be  aided  through 
extensive  indexing  of  attributes. 

The  approach  presented  here  leads  to  many  new  capabilities  for  the  database  man- 
agement system.  It  will  continue  to  close  the  gap  between  databases  and  knowledge 
representation  and  will  lead  to  databases  with  a  more  realistic  view  of  categories. 

4.5    Summary 

Up  to  this  point,  a  complete  description  of  a  terminological  knowledge  represen- 
tation system  based  on  theories  of  categorization  has  been  presented.  The  first  four 
chapters  can  be  summarized  as  follows: 

1.  The  concepts  of  terminological  knowledge  representation  were  introduced  by 
using  CANDIDE.  Apphcations  to  query  processing  and  natural  language  were 
discussed. 

2.  Problems  with  the  original  conceptualization  were  identified.  The  problems 
were  a  result  of  an  incomplete  theory  of  categorization.  New  theories  were 
introduced,  drawing  from  work  done  in  cognitive  science. 

3.  A  formal  conceptual  clustering  algorithm  incorporated  these  new  theories  of 
categorization  into  a  complete  system  for  terminological  reasoning. 

The  next  two  chapters  are  more  application  oriented.  Chapter  5  discusses  direct 
integration  of  natural  language  processing  with  the  terminological  knowledge  repre- 
sentation system.  Chapter  6  discusses  applications  to  modehng  and  simulation. 


CHAPTER  5 
A  CONCEPTUAL  CLUSTERING  ALGORITHM  FOR  LEXICAL  ACQUISITION 


The  terminological  knowledge  representation  system  and  conceptual  clustering  al- 
gorithm presented  in  the  previous  chapters  are  apphed  to  natural  language  processing. 
This  leads  not  only  to  a  better  integration  of  natural  language  with  databases  than 
was  achieved  in  Chapter  2,  but  the  new  category  theory  leads  to  a  better  represen- 
tation of  linguistic  knowledge.  The  result  is  a  more  flexible  natural  language  system 
which  is  able  to  deal  with  irregularities.  Most  important,  the  system  is  able  to  ac- 
commodate new  language  usage  patterns.  This  process  culminates  in  an  algorithm 
for  lexical  acquisition.  This  chapter  begins  with  an  illustration  of  category  effects 
occurring  at  various  levels  of  language  analysis.  Next,  the  application  of  the  data 
modehng  and  conceptual  clustering  principles  presented  in  the  previous  chapters  are 
applied  to  language  processing.  Finally,  the  lexical  acquisition  algorithm  is  presented 
in  detail  and  illustrated  by  an  example. 

5.1    Evidence  for  Category  Effects  in  Language 

Category  effects  appear  at  all  levels  of  language  phenomena.  This  section  presents 
a  number  of  exhibits  to  illustrate  the  role  of  family  resemblance,  prototypes,  similarity, 
and  cognitive  models  in  phonology,  morphology,  syntax,  cind  semantics. 
5.1.1    Phonology 

Miller  and  Johnson- Laird  [76]  provide  a  listing  of  onomatopoeic  and  biological 
manner-of- speaking  verbs.  These  words  mimick  the  sounds  of  animals  and  humans. 
The  phonetic  quality  of  these  sounds  is  presumably  a  direct  result  of  emotional  or 
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biological  states.  Miller  and  Johnson-Laird  have  provided  a  categorization  of  these 
verbs  by  emotional  state  along  with  features,  which  is  summarized  in  Table  5.1. 

Although  the  categorization  of  these  verbs  into  emotional  states  is  highly  subjec- 
tive, and  Miller  and  Johnson- Laird  make  no  definitive  claims  about  the  association, 
these  examples  illustrate  the  tradeoff  between  similarity-based  and  theory-based  cat- 
egorization. Notice  that  it  is  very  difficult  to  justify  the  categorization  only  on  the 
basis  of  similarity  of  feature  values.  In  some  cases  a  similarity  is  apparent.  Anger  is 
expressed  with  a  loud,  vocalic  sound.  Satisfaction  is  expressed  with  a  soft,  vocalic 
sound.  But  it  is  clear  that  this  categorization  is  guided  by  other  knowledge.  In  par- 
ticular, we  also  have  knowledge  of  the  physical  expressions  exhibited  by  the  speaker 
when  using  these  words.  And  we  have  a  model  eissociating  these  expressions  with 
emotional  states.  For  example,  we  know  that  "hiss"  is  the  sound  of  a  snake  or  cat 
used  as  a  warning  before  attack.  We  know  the  situations  which  will  cause  a  "gasp"  or 
a  "cry."  These  are  as  important  as  the  quality  of  the  sound  produced  in  categorizing 
emotional  state. 
5.1.2   Morphology 

The  formation  of  tense  in  verbs  or  plurality  in  nouns  follows  prototypic,  regular 
forms,  such  as  adding  -ed  or  -s.  But  tense  and  plurality  are  often  expressed  in  highly 
irregular  forms.  In  most  cases  similarity  effects  can  be  noted  and  a  general  rule 
created  to  describe  the  form  [121]. 

Although  each  of  the  19  categories  for  forming  the  plural  forms  of  nouns  shown  in 
Table  5.2  appears  to  obey  a  general  rule,  variation  is  also  apparent.  Some  irregular 
forms  must  be  treated  on  a  case  by  case  basis.  There  is  neither  a  general  rule  nor 
similarity  involved  in  these  irregular  forms.  Each  is  its  own  model  [120]. 

Several  principles  of  category  structure  are  illustrated  in  the  examples.  First,  there 
is  a  central  prototype  (just  adding  -s),  but  there  is  also  a  diversity  of  exceptional 
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Verb 


Loud       High-pitched     Momentary     Vocalic      Emotion 


roar,growl 

+ 

- 

snarl, bellow 

bark, snap 

+ 

- 

scream,shriek, 

+ 

+ 

screech 

howl 

+ 

cry,wail 

+ 

+ 

squawk,croak 

squeal 

+ 

+ 

whine 

+ 

grunt, snort 

- 

hiss 

purr 

- 

- 

coo 

- 

+ 

bleat, bray 

twitter 

- 

+ 

chirp 

- 

+ 

cackle,  chuckle. 

chortle 

titter,giggle 

+ 

groan, moan 

- 

whimper,sob 

- 

sigh 

gasp 

pant 

+ 


+ 


+ 


+ 


+  Anger 

+  Anger 

+  Anger, Fear 

+  Anger,Sadness 

-f  Anger,Sadness 

+  Dissatisfaction 

+  Dissatisfaction 

+  Dissatisfaction 


+ 


+ 

Disapproval 

- 

Disapproval 

+ 

Satisfaction 

+ 

Satisfaction 

+ 

Nervousness 

- 

Nervousness 

_ 

Amusement 

+ 

Amusement 

+ 

Amusement 

+ 

Sadness,Dissatisfaction 

- 

Sadness 

- 

Sadness,relief 

- 

Suprise 

- 

Excitement 

Table  5.1:  Miller  and  Johnson- Laird's  Categoriation  of  Onomatopoeia  and  Biological 
Manner  of  Speaking  Verbs 
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floor 

floors 

table 

tables 

judge 

judges 

ax 

axes 

tax 

taxes 

fox 

foxes 

county 

counties 

baby 

babies 

fly 

flies 

authority 

authorities 

ski 

skis 

rabbi 

rabbis 

leaf 

leaves 

shelf 

shelves 

wife 

wives 

studio 

studios 

cameo 

cameos 

concerto 

concertos 

genus 

genera 

opus 

opera 

corpus 

corpora 

alga 

algae 

minutia 

minutiae 

alumna 

alumnae 

index 

indices 

matrix 

matrices 

crux 

cruces 

addendum 

addenda 

medium 

media 

phenomenon 

phenomena 

lens 

lenses 

sash 

sashes 

church 

churches 

ox 


oxen 


boy 

boys 

attorney 

attorneys 

ray 

rays 

gulf 

gulfs 

roof 

roofs 

safe 

safes 

embargo 

embargoes 

hero 

heroes 

tomato 

tomatoes 

alumnus 

alumni 

fungus 

fungi 

apparatus        apparatus 


analysis  analyses 

parenthesis     parentheses 


woman 

women 

child 

children 

foot 

feet 

mouse 

mice 

Table  5.2:  19  Categories  for  Forming  Plural  Nouns 
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clusters.  Most  of  these  clusters  can  be  described  by  a  rule  and  exhibit  their  own 
prototypes.  Yet,  there  is  a  wide  variety  of  exceptions,  and  among  many  irregular 
forms  there  appears  to  be  no  basis  for  convention. 

The  existence  of  radically  different  irregular  forms  implies  that  there  can  be  a 
significant  error  rate  when  classifying  a  new  case.  Placing  a  highly  irregular  form 
into  the  wrong  category  is  an  error  that  can  only  be  detected  in  conversation.  Note 
also  that  such  an  error  might  also  go  undetected  yet  be  understandable  (e.g.  indexes, 
leafs,  concertoes,  heros). 
5.1.3   Syntax 

Category  effects  in  syntax  are  illustrated  in  this  example  on  the  grammatical 

structure  of  questions.   Sample  sentences  were  obtained  from  volunteers  who  where 

asked  to  write  questions  after  being  shown  a  book  on  ornamental  plants  [72].    A 

portion  of  the  sample  included  "Wh_"  questions  in  which  an  auxiliary  is  separated 

from  the  main  verb  by  a  noun  acting  as  the  subject.  The  prototype  syntactic  structure 

of  the  Wh_  question  is  of  the  form: 

<  Wh.  X  Auxiliary  X  Noun  X  Verb  > 

Where  do  palms  grow? 

The  sample  sentences  included  a  number  of  Wh_  examples  such  as: 

What  shape  does  the  fruit  of  the  Solitaire  Palm  have? 

Even  this  sentence  deviates  slightly  from  the  prototype.  The  Wh_  has  been  altered 

to  take  the  form  of  a  noun  phrase,  and  the  noun  subject  has  been  altered  into  a  noun 

phrase  including  a  prepositional  phrase.  Certainly  the  question  format  need  not  even 

begin  with  a  Wh_  form  as  in: 

How  high  can  the  Pygmy  Date  Palm  grow? 

How  many  trunks  can  the  Senegal  Date  Palm  have? 

How  many  pounds  of  fruit  can  one  Date  Palm  produce? 
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A  related  form  drops  the  leading  phrase  all  together  and  beings  with  the  auxiliary: 

Does  Coontie  belong  to  the  family  Cycads? 
Does  the  Princess  Palm  have  flowers  and  fruit? 

Thus,  the  beginning  question  phrase  can  be  an  adjective,  a  noun  phrase,  or  simply 

absent.  In  addition,  the  beginning  question  phrase  can  be  a  prepositional  phrase: 

In  what  climates  will  the  Date  Palm  produce  fruit? 

In  what  part  of  Florida  can  the  India  Date  Palm  be  planted? 

From  where  did  the  MacArthur  Palm  originate? 

An  interesting  deviational  form  (a  form  which  deviates  from  accepted  usage)  con- 
tains a  leading  prepositional  phrase,  but  the  preposition  itself  has  been  moved  to  the 
end  of  the  sentence: 

Where  does  the  Cliff  Date  Palm  originate  from? 

What  is  the  seed  of  the  Date  Palm  used  for? 

What  does  the  fruit  of  the  Cocos  Plumasa  taste  like? 

What  species  of  plant  does  the  Sugar  Palm  belong  to? 

How  can  the  trunk  of  the  tree  be  described  as? 

What  two  names  may  Phoenix  refer  to? 

What  kind  of  soil  does  the  Yellow  Butterfly  Palm  grow  in  best? 

These  deviational  sentences  can  only  be  understood  because  of  their  similarity  to  the 

form  in  which  the  leading  prepositional  phrase  is  complete. 

The  only  syntactic  structure  that  all  these  examples  have  in  common  is  the 
(  Auxiliary  )  (  Noun  Phrase  }  {  Verb  )  form  contained  within  the  sentence.  The  category 
structure  for  these  examples  display  a  richness  of  structure  not  at  all  evident  in  the 
main  "Wh_"  prototype. 
5.1.4   Semantics  and  Cognitive  Models 

The  way  in  which  a  word  is  used  in  different  situations  also  exhibits  category 
effects.  To  illustrate,  a  sample  of  126  sentences  containing  the  word  "soil"  was  taken 
from  technical  literature  on  ornamental  plants  [5]  by  using  an  electronic  text  searching 
program  which  can  extract  all  the  sentences  containing  a  particular  word.  Soil  has 
a  prototypic  use,  as  the  medium  in  which  plants  grow.  But  there  is  a  wide  variation 
in  use  over  the  126  sentences.   Individual  sentences  can  be  clustered  into  groups  of 
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similar  use.  For  example,  many  of  the  sentences  refered  to  soil  characteristics: 

moist  loamy  soil 
muck  type  soil 
rich  fibrous  soil 
rich  moist  soil 
soil  is  too  sandy 

In  another  group,  soil  acts  as  an  adjective: 

soil  sample 

soil  water 

soil  type 

soil  pH 

soil  characteristics 

soil  conditions 

Then  there  are  different  purposes  for  soil  such  as: 

potting  soil 
garden  soil 

At  least  three  different  models  relating  to  soil  were  needed  to  understand  certain 

sentences.  One  model  treats  soil  as  a  location  in  space  having  a  surface  and  a  depth: 

lifting  the  tree  from  the  soil 

cover  the  soil 

through  the  soil 

above  the  soil  surface 

on  top  of  the  soil 

top  soil 

upper  six  inches  of  soil 

A  second  model  treats  soil  as  a  storage  device  with  a  capacity  for  water.    In  this 

model,  soil  is  like  a  tank  that  can  be  filled  with  water  to  various  levels  of  capacity: 

the  water  holding  capacity  of  the  soil 

the  soil  is  dry 

soil  water  is  getting  low 

water  may  be  in  the  soil 

moist  soil 

an  efficient  watering  does  not  saturate  the  soil 

The  third  model  involving  the  relationship  between  water  balance,  soil,  and  plants  is 

needed  to  understand  this  sentence: 

During  such  times  plants  wilt  even  though  water  may  be  in  the 
soil  because  they  are  losing  water  faster  than  it  is  absorbed 
through  the  root  systems. 
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Understanding  this  sentence  requires  a  model  describing  the  flow  of  water  from  the 
soil,  into  the  roots,  up  to  the  leaves  of  the  plant,  and  out  from  the  leaves  to  the 
atmosphere  (evapotranspiration).  The  causal  explanation  given  for  why  the  leaves 
are  wilting  must  be  mapped  onto  the  model.  Precisely,  the  rate  of  evapotranspiration 
exceeds  the  rate  of  water  uptake  by  the  roots  from  the  soil.  Without  the  model  there 
is  no  way  to  understand  the  relationship  between  water  in  the  leaves  and  water  in 
the  roots. 

It  is  possible  that  very  sophisticated  models  of  soil,  such  as  would  be  provided 
by  soil  scientists,  could  be  included.  In  this  approach,  language  can  be  very  closely 
coupled  with  models.  This  will  aid  not  only  in  the  language  understanding  process, 
but  also  in  the  model  building  process.  This  topic  is  addressed  in  greater  detail  in 
Chapter  6. 

These  examples  illustrate  category  effects  occurring  at  the  semantic  level.  Several 
different  cognitive  models  of  soil  are  needed  to  understand  these  sentences.  Variation 
in  use  of  the  word  "soil"  over  the  sample  sentences  indicates  several  different  categories 
which  treat  soil  as  a  growth  medium,  as  a  location,  and  as  an  entity.  This  discussion 
is  continued  in  the  example  in  Section  5.4. 

5.2   Applying  the  Data  Model  to  Linguistic  Objects 

This  section  shows  how  the  data  model  and  conceptual  clustering  techniques  de- 
scribed in  Sections  2.2  and  4.2  can  be  applied  to  natural  language  processing  and,  in 
particular,  to  the  problem  of  lexical  acquisition.  With  an  additional  data  type  needed 
for  representing  grammars,  CANDIDE  can  store  syntactic  information  in  addition  to 
domain-specific  semantic  information.  It  will  be  shown  how  CANDIDE  can  be  used 
to  represent  a  class  of  grammar  formalisms  known  as  the  unification  grammars.  Us- 
ing the  data  model  leads  to  further  integration  of  language  processing  with  database 
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operations.  The  same  categorization  principles  used  to  organize  data  can  be  applied 
to  language  processing  and  lead  to  a  better  account  of  the  category  effects  in  language 
illustrated  in  the  previous  section. 

5.2.1  Semantic  Structures 

The  semantics  of  a  natural  language  expression  can  be  represented  in  a  standard 
way  using  CANDIDE  class  and  instance  descriptions.   An  expression  such  as  "The 

soil  is  dry"  is  represented  by  an  Instance  description: 

Soil 

PARENTS:  Storage  Device 
ATTRIBUTES 

Moisture  Level:  Dry 

as  was  described  in  Section  2.2. 

5.2.2  Syntactic  Structures 

Terminological  knowledge  representation  systems  are  useful  for  representing  uni- 
fication grammars  [107,52,106,37].  Unification  grammars,  which  include  Lexical- 
functional  Grammar,  build  syntactic  structures  based  on  a  common  attribute-value 
pair  notion.  These  structures  can  be  stored  as  classes  and  instances  in  CANDIDE. 

A  new  relationship  is  needed  to  describe  grammars  because  of  the  ordering  of 
expressions  within  a  grammar  rule.  An  LFG-style  grammar  rule  (Section  2.6.1)  such 
as: 

S  ^  NP  (1  SUB  =  i),  VP  (t  =  i). 

can  be  written  as  a  grammatical  object: 

S 
SUPERCLASS:  Sentence 
ATTRIBUTE  RESTRICTIONS 

1:  EXACTLY  1  Class  NP 

2:  EXACTLY  1  Class  VP 

HEAD  SUB:  EXACTLY  1  PATH(1,HEAD) 

HEAD:  EXACTLY  1  PATH(2,HEAD) 
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Such  generic  rules  are  expressed  as  Class  objects.  In  this  example,  the  rule  and  the  ob- 
ject both  express  the  grammatical  relation  that  a  sentence  S  can  be  constructed  from 
a  phrase  consisting  of  a  noun  phrase  NP  followed  by  a  verb  phrase  VP.  The  HEAD 
(main  constituent)  of  the  sentence  is  the  verb  phrase.  The  HEAD  SUB  (subject)  of 
the  sentence  is  the  noun  phrase. 

In  the  object  notation,  the  linear  orientation  of  the  phrasal  elements  is  enforced 
using  numbers  (1,2...).  In  addition,  a  new  data  type  called  PATH  has  been  introduced: 

PATH{d,,d2,...,dn) 

The  PATH  takes  an  ordered  sequence  of  elements  as  objects.  The  sequence  defines 
a  path  that  can  be  used  to  retrieve  a  certain  value.  It  is  like  a  "dot"  operator  used 
in  data  modehng  languages.  The  elements  of  the  path  are  other  objects,  including 
attributes.  The  path  consists  of  a  chain  of  attribute- value  relationships.  For  example, 
PATH(1,HEAD)  appearing  as  a  value  of  attribute  HEAD  SUB  in  S  is  used  to  find  the 
sentence's  subject  (HEAD  SUB).  The  path  begins  at  attribute  "1"  and  ends  at  the 
HEAD  value  for  this  attribute.  In  other  words,  the  noun  phrase  is  the  subject  of  the 
sentence.  Similarly  for  the  HEAD,  PATH(2,HEAD)  says  that  the  main  constituent 
of  the  sentence  is  obtained  from  the  HEAD  of  attribute  "2,"  in  other  words  the  main 
verb. 

Grammatical  objects  can  be  nested  just  as  any  other  database  object.  The  Ex- 
tended Description  Graph  can  be  created  for  any  object  as  described  in  Section  4.2.3.2. 
A  parse  tree  in  unification  grammars  is  represented  by  such  a  graph.  The  parse  tree 
can  then  be  thought  of  as  a  directed  acyclic  graph  (DAG).  The  Extended  Description 
Graph  may  in  general  have  cycles  as  explained  in  Section  4.2.3.7.  Such  cycles  will 
not  occur  in  grammatical  objects. 

A  unification  grammar  can  now  be  defined  entirely  in  terms  of  data  objects: 


^■■^ 
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A  UNIFICATION  GRAMMAR  is  a  triplet  G  =  (L,  G,  S)  where 

•  L  is  the  lexicon.  L  is  a  set  (<,  ci,  C2, . . . ,  C3)  where 

—  Hs  a  terminal 

—  (<,ci,C2, . . .  ,03)  are  class  objects  representing  the  categories  for  t. 

•  G  is  a  set  of  grammatical  objects.  Each  object  has  the  form: 

L 

SUPERCLASS:  non-terminal 
ATTRIBUTE  RESTRICTIONS 

1:  EXACTLY  1  Grammar_Class 

2:  EXACTLY  1  Grammar_Class 

3:  EXACTLY  1  Grammar.Class 

al:  EXACTLY  1  PATH 
a2:  EXACTLY  1  PATH 
a3:  EXACTLY  1  PATH 

where  the  SUPERCLASS  is  some  non-terminal  symbol  for  the  left  side  of  the 
context-free  rule;  1,  2,  3...  are  attributes  for  each  term  on  the  right  side  of 
the  rule  (terminals  or  non-  terminals);  and  al,  a2,  a3...  are  expressions  that 
agument  the  context-free  rule.  All  grammatical  objects  are  subsumed  by  a  class 
object  called  Grammar_Class. 

•  S  is  a  class  object  for  the  start  symbol  of  the  grammar. 

When  a  phrase  or  sentence  is  parsed,  the  objects  representing  the  grammar  rules 
used  in  the  parse  are  instantiated.  This  results  in  objects  containing  attribute-value 
pairs  in  which  the  values  are  subtrees  of  the  parse  tree,  thus  forming  a  DAG.  As  an 
example,  the  object  representing  the  syntax  of  the  sentence  "John  ate  dinner"  apphes 
the  grammar  rule  described  earlier  (Figure  5.1). 
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S 

1:  NP 

1:  N 

HEAD:  John 
HEAD:  PATH(1,HEAD) 
2:  VP 

1:  V 

HEAD:  Ate 
2:  NP 
1:  N 

HEAD:  Dinner 
HEAD:  PATH(1,HEAD) 
HEAD:  PATH(1,HEAD) 
HEAD  OBJ:  PATH(2,HEAD) 
HEAD  SUB:  John 
HEAD:  Ate 

SUB:  John 
OBJ:  Dinner 

Figure  5.1:  John  ate  dinner. 

Unfortunately,  the  advantages  gained  by  uniformly  representing  linguistic  informa- 
tion using  the  same  data  model  are  partially  offset  by  the  complexity  of  the  resulting 
notation.  The  notation  here  is  simplified  by  using  the  parent  of  each  nested  object 
as  the  name  of  the  object,  and  ":"  are  attribute-value  relationships.  The  sentence  S 
contains  two  nested  objects  for  the  NP  and  VP.  The  NP  is  simply  a  single  noun 
"John."  The  VP  consists  of  a  verb  V  followed  by  a  noun  phrase  NP. 

The  value  of  attribute  HEAD  contains  a  semantic  representation  of  the  sentence. 
This  representation  is  obtained  during  the  parsing  process  by  following  PATHs  spec- 
ified in  the  grammatical  objects.  The  HEAD  value  for  S  was  obtained  by  connecting 
the  HEAD  for  the  VP  and  attaching  HEAD  SUB  and  HEAD  OBJ.  For  example, 
the  HEAD  of  the  sentence  is  identified  by  PATH(2,HEAD)  given  in  the  grammatical 
class  object  for  S  described  earlier.  "Ate"  is  obtained  by  searching  this  path: 
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2  -^  HEAD  -^  1  -^  HEAD  -^  Ate 

Attribute  "2"  is  the  VP,  the  HEAD  of  the  VP  is  given  by  PATH(1,HEAD)  where 
"1"  is  y,  and  the  HEAD  of  V  is  "Ate."  HEAD  SUB  and  HEAD  OBJ  were  obtained 
by  a  similar  procedure. 

Figure  5.2  shows  a  complex  example  for  the  sentence,  "John  quietly  ate  a  dehghtful 
dinner  at  Leonardo's."  Here  the  adjective  "quietly"  and  prepositional  phrase  "at 
Leonardo's"  are  modifiers  of  the  head  verb  "ate."  The  determiner  "a"  and  adjective 
"delightful"  modify  "dinner."  Modifiers  are  associated  with  the  elements  they  modify 
through  "HEAD  MOD"  attributes. 

5.2.3   Subsumption  of  Syntactic  Structures 

The  sentence  "John  ate  dinner"  syntactically  subsumes  the  sentence,  "John  quietly 
ate  a  dehghtful  dinner  at  Leonardo's."  That  is,  the  object  representing  the  DAG  for 
the  first  sentence  subsumes  the  object  representing  the  DAG  for  the  second  sentence. 
Establishing  a  subsumption  relationship  between  syntactic  expressions  is  a  useful  part 
of  the  conceptual  clustering  procedure. 
Procedure:  DAG  Subsumption 

DAG  subsumption  is  computed  as  follows.  Consider  two  DAGs,  PT\  and  PT2, 
representing  two  parse  trees.  PTl  subsumes  PT2  if  PTl  is  contained  within  PT2. 
The  subsumption  algorithm  trys  to  map  each  node  and  arc  in  PTl  to  a  corresponding 
node  and  arc  in  PT2.  The  height  of  a  DAG,  H{DAG),  is  the  length  of  the  longest 
of  all  paths  from  the  root  node  to  some  leaf.  PTl  subsumes  PT2  if: 

•  H(PTl)  <  H(PT2) 

•  Some  node  in  PT2  corresponds  to  the  root  node  of  PTl.  There  may  be  more 
than  one  such  node.  That  is,  PTl  may  map  to  one  or  more  subtrees  in  PT2. 
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S 

1:  NP 

1:  N 

HEAD:  John 
2:  VP 

1:  ADV 

HEAD:  Quietly 
2:  V 

HEAD:  Ate 
3:  NP 

1:  DET 

HEAD:  A 
2:  ADJ 

HEAD:  Delightful 
3:  N 

HEAD:  Dinner 
HEAD:  PATH(3,HEAD) 
HEAD  MOD:  PATH(1,HEAD) 
PATH(2,HEAD) 
4:  PP 

1:  PREP 

HEAD:  at 
2:  N 

HEAD:  Leonardo's 
HEAD:  PATH(1,HEAD) 
HEAD  OBJ:  PATH(2,HEAD) 
HEAD:  PATH(2,HEAD) 
HEAD  OBJ:  PATH(3,HEAD) 
HEAD  MOD:  PATH(1,HEAD) 
PATH(4,HEAD) 
HEAD  SUB:  John 
HEAD:  Ate 

SUB:  John 
OBJ:  Dinner 

MOD:  A 
MOD:  Dehghtful 
MOD:  Quietly 
MOD:  At 

OBJ:  Leonardo's 


Figure  5.2:  John  quietly  ate  a  delightful  dinner  at  Leonardo's. 
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•  At  each  matched  pair  of  nodes  Ml  and  M2  where  Ml  in  PTl  maps  to  M2  in 
PT2,  the  children  of  Ml  must  map  to  the  children  of  PT2. 

Possible  cases: 

-  Ml  and  M2  have  the  same  number  of  children. 

Each  child  Ml'  of  Ml  maps  to  a  corresponding  child  M2'  of  M2  with  the 
same  attribute  label  number.  Then  Ml'  =  M2',  the  DAG  rooted  at  Ml' 
subsumes  the  DAG  rooted  at  M2',  or  Ml'  corresponds  to  the  HEAD  node 
of  the  DAG  rooted  at  M2'. 

—  M2  has  more  children  than  Ml. 

If  Ml  only  has  one  child,  then  it  must  map  to  the  HEAD  node  of  the  DAG 
rooted  at  M2. 

If  Ml  has  more  than  one  child,  then  the  components  of  the  DAG  rooted 
at  Ml,  (HEAD,  SUB,  OBJ,  0BJ2,  or  MOD)  must  map  to  correponding 
components  in  the  DAG  rooted  at  M2. 

•  At  matched  leaves  Ll  and  L2,  either  Ll  =  L2,  or  if  LI  is  a  class  then  L2  must 
be  contained  within  Ll. 

5.2.4   Intersection  of  Syntactic  Structures 

An  Intersection  between  two  syntactic  structures  can  also  be  computed.     For 
example,  the  Intersection  of  the  syntactic  structures  representing  the  two  sentences: 

How  many  trunks  can  the  Senegal  Date  Palm  have? 
Does  the  Princess  Palm  have  flowers  and  fruit? 

is  the  pattern  (  AUX  )(  NP  )(  VT  )  that  is  present  in  both  sentences.  The  complete 
Intersection  is  represented  by  the  grammatical  object: 
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FRAGMENT 
1:  AUX 
2:  NP 

1:  DET 

HEAD:  The 
2:  NOUN 

HEAD:  CLASS  Palm 
HEAD  MOD:  PATH(1,HEAD) 
3:  VT 

HEAD:  Have 
HEAD:  PATH(3,HEAD) 
HEAD  SUB:  PATH(2,HEAD) 
HEAD  MODE:  PATH(1,HEAD) 

The  Intersection  is  an  incomplete  grammatical  expression  because  there  is  no  OBJ 
for  the  transitive  verb  (VT)  "Have,"  hence  it  is  labeled  as  a  FRAGMENT.  In  both 
sentences,  the  verb  is  the  head  of  the  fragment,  the  auxiliaries  "Does"  and  "Can" 
(AUX)  set  the  MODE  of  the  verb,  and  the  noun  phrase  is  the  subject  of  the  verb. 
This  Intersection  forms  a  useful  class  in  that  if  another  pattern  is  observed  of  the 
form  (  AUX  )(  NP  )(  VT  ),  then  the  corresponding  subject  and  modal  relationships 
can  be  applied  as  defaults. 
Procedure:  DAG  INTERSECT 

The  procedure  for  computing  the  Intersection  of  two  syntactic  structures  is  cis 
follows.  Let  PTl  and  PT2  be  DAGs  representing  each  structure.  Then 

INTERSECT{PTl,PT2)  =  PT 

where  PT  is  a  DAG  in  which  each  node  and  link  can  be  mapped  by  a  function  to 
corresponding  pairs  of  nodes  and  links  in  PTl  and  PT2  such  that: 

•  The  root  node  in  PT  maps  to  a  matching  pair  (PTlr,PT2r)  of  nodes  from  PTl 
and  PT2  which  are  root  nodes  of  DAGs  within  PTl  and  PT2.  The  matching 
pairs  must  correspond  to  HEADs  of  the  DAGs,  and  the  HEADs  must  be  of 
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compatible  types  (a  verb  head  must  be  matched  with  a  verb  head,  a  noun 
with  a  noun,  and  so  forth,  but  not  a  noun  with  a  verb).  Compatible  types  are 
determined  by  the  type  taxonomy  (Verb  subsumes  Transitive  Verb). 

•  The  DAGs  from  the  root  node  of  PT  must  map  to  matching  pairs  of  DAGs  from 
PTlr  and  PT2r.  The  matched  pairs  must  have  matching  components  (SUB, 
OBJ,  0BJ2,  MOD). 

•  Leaves  of  each  DAG  may  correspond  to  particular  instances  or  classes.  Matched 
instances  must  be  identical.  If  an  instance  is  mapped  to  a  class,  the  instance 
must  be  a  member  of  the  class. 

5.2.5   Exception  Condition 

As  discussed  in  Section  4.2.4,  the  Exception  Condition  is  a  condition  raised  when 
it  appears  that  a  new  instance  may  possibly  belong  to  an  existing  class,  yet  fails 
to  meet  the  conditions  specified  in  the  class  description.  The  Exception  Condition 
also  provides  a  general  approach  to  identifying  lexical  gaps.  The  need  for  such  an 
approach  is  described  by  Zernik  [132]  who  defines  lexical  gaps  as  a  new  word  use  not 
covered  by  existing  patterns  in  the  lexicon. 

Consider  these  examples  of  "make"  provided  by  Zernik: 

1.  John  made  a  great  meal. 

2.  Further  modifications  made  the  project  economically  attractive. 

3.  John  made  Mary  a  cake. 

4.  That  made  it  the  most  expensive  cleanup. 

5.  Mexico  made  Salinas  president. 

Suppose  patterns  already  exist  in  the  lexicon  for  the  use  of  "made"  which  appears  in 
sentences  1  and  3,  but  there  are  no  patterns  corresponding  to  sentences  2,  4,  and  5. 
The  problem  is  that  the  clustering  algorithm  must  avoid  applying  the  interpretation 
for  sentence  1  to  sentence  2,  and  also  must  avoid  applying  the  interpretation  for 
sentence  3  to  sentences  4  and  5.  There  would  be  a  tendency  to  do  make  this  mistake 
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since  sentence  1  and  2  are  both  of  the  form  {  NP  )(  made  }(  NP  ),  and  sentences 

3,  4,  and  5  are  all  of  the  form  (  NP  )(  made  )(  NP  )(  NP  ).    For  example,  the 

algorithm  might  mistakenly  interpret  sentence  5  as  Mexico  having  somehow  created 

a  president  and  given  it  to  Salinas,  just  as  in  sentence  3  John  made  a  cake  and  gave  it 

to  Mary.  With  insufficient  information,  such  interpretations  become  viable.  In  fact, 

sentences  2,  4,  and  5  represent  lexical  gaps  since  they  introduce  new  patterns  for 

"make"  that  differ  from  the  existing  patterns  for  1  and  3.  It  will  be  shown  how  the 

Exception  Condition  is  raised  for  2,  4,  and  5,  thus  signaling  something  potentially 

unusual  about  these  sentences.    This  will  result  in  the  identification  of  the  lexical 

gaps. 

Suppose  one  lexical  entry  for  "make"  corresponds  to  the  pattern  of  sentence  1: 

MAKE_1 

Phrase:  (  NPl  ){  make  )(  NP2  ). 
HEAD  SUB:  NPl 

SUPERCLASS:  Person 
HEAD  OBJ:  NP2 

SUPERCLASS:  Physical  Object 
HEAD:  Create 

SUPERCLASS:  Action 

Agent:  NPl 

Object:  NP2 

Mechanism:  ATLEAST  (  Mechanism  ) 

Again  the  notation  has  been  simplified;  for  example,  the  value  for  Phrase  should 
be  written  out  as  a  complete  DAG.  MAKE_1  describes  the  basic  notion  of  "make" 
with  the  SUB  being  a  Person  creating  some  OBJ  which  is  a  Physical  Object.  There 
is  experimental  data  from  childhood  language  development  [70]  indicating  that  this 
class  is  one  of  the  first  patterns  acquired  for  the  verb  "make."  Other  uses  of  "make" 
are  learned  in  contrast  to  MAKE_1.  The  Create  model  requires  that  a  Mechanism  for 
creating  the  Object  NP2  be  identifiable  and  that  NPl  be  the  Agent  of  the  action  of 
creating.  Sentence  1  is  an  instance  of  MAKE_1.  The  mechanism  for  making  a  great 
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meal  can  be  specified,  namely  the  steps  involved  in  cooking.  The  model  for  cooking 
requires  the  chef  as  an  agent. 

In  considering  sentence  2,  the  exception  condition  is  raised  with  MAKE.l  since 
sentence  2  has  the  (  NPl  )(  make  )(  NP2  )  pattern  in  common  with  other  instances 
of  the  MAKE-l  class.  Yet,  sentence  2  fails  to  meet  the  MAKE.l  class  description  for 
several  reasons.  Not  only  do  NPl  ("modifications")  and  NP2  ("the  project")  fail  to 
fit  the  domains  specified  in  the  description,  but  the  adjective  phrase  "economically 
attractive"  must  also  be  accounted  for.  It  is  conceivable  that  MAKE_1  can  be  mod- 
ified to  accept  "modification"  and  "the  project"  as  acceptable  arguments  for  NPl 
and  NP2.  Then  a  sentence  such  as  "Modifications  made  the  project"  would  fit  the 
model.  But  in  sentence  2  it  is  the  adjective  phrase  "economically  attractive"  that 
makes  the  difference.  Where  does  this  adjective  phrase  get  attached?  There  is  no 
syntactic  justification  for  attaching  it  to  NP2  since  that  would  result  in  an  unusual 
post-noun  modifier.  That  is,  it  would  result  in  a  noun  phrase  like  "project  attractive." 
Likewise,  the  adjective  phrase  does  not  modify  the  verb  "make."  That  would  require 
an  adverbial  form,  "attractively".  Note  that  it  is  not  impossible  that  this  adjective 
phrase  attaches  in  one  of  these  two  places,  it  is  just  very  unlikely.  It  is  unlikely  in  the 
context  of  default  reasoning  which  is  explained  further  in  Section  5.2.7. 

The  Exception  Condition  suggests  two  hypotheses:  1  )  Sentence  2  is  a  member 
of  the  MAKE.l  class  and  the  MAKE_1  class  must  be  modified;  or  2  )  Sentence  2 
is  the  start  of  a  new  class  representing  a  new  sense  of  the  word  MAKE.  At  least 
the  Exception  Condition  has  blocked  an  unquestioned  acceptance  of  the  first  choice. 
With  no  additional  information,  the  Exception  Condition  must  be  resolved  through 
interaction  with  the  user.  On  the  other  hand,  if  other  phrases  such  as  "cut  it  short," 
"paint  it  green,"  or  "dig  it  deep"  are  available  in  the  database,  then  the 
(  VERB  )(  NP  )(  AD  J  )  pattern  would  be  recognized  immediately. 
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Sentences  3,  4,  and  5  are  of  the  form  (  NPl  )(  make  )(  NP2  )(  NP3  }.  Suppose  that 
knowledge  about  indirect  objects  is  already  available  from  phrases  such  as  sentence 
3.  A  slight  variation  on  MAKE_1,  called  MAKE_2,  would  accommodate  the  indirect 
object.  For  example,  let  NP3  be  the  direct  object,  add  NP2  as  the  indirect  object 
with  the  domain  of  its  argument  as  (  Person  ),  and  modify  the  Create  model  to  make 
NP2  the  recipient  of  NP3.  Notice  in  this  model  that  NP2  and  NP3  represent  two 
distinct  entities,  and  that  NP2  must  be  capable  of  owning  NP3.  As  in  the  previous 
example,  sentences  4  and  5  will  raise  an  exception  condition  with  MAKE_2. 

MAKE_2 

Phrase:  {  NPl  )(  make  )(  NP2  ){  NP3  ) 
HEAD  SUB:  NPl 

SUPERCLASS:  Person 
OBJl:  NP2 

SUPERCLASS:  Person 
0BJ2:  NP3 

SUPERCLASS:  Physical  Object 
HEAD:  Create 

SUPERCLASS:  Action 

Agent:  NPl 

Object:  NP3 

Mechanism:  ATLEAST  (  Mechanism  ) 
Own 

Agent:  NP2 

Object:  NP3 

In  sentence  4,  "it"  will  fail  to  meet  the  domain  restriction  for  NP2.  "It"  is  not 
capable  of  owning  a  "cleanup."  Furthermore,  a  "cleanup"  is  not  a  Physical  Object, 
though  a  mechanism  for  making  a  cleanup  could  be  explained.  In  sentence  5,  NP2 
("Salinas")  does  fit  the  domain  restriction,  but  the  causal  model  of  President  will  fail 
to  explain  the  mechanism  for  making  a  "president,"  at  least  not  any  mechanism  based 
on  the  citizens  of  a  country.  The  knowledge  the  system  has  about  making  presidents 
would  only  include  such  instances  as,  "Harvard  has  produced  several  presidents," 
or  "Courage  and  kindness  make  a  good  president."    Furthermore,  it  is  difficult  to 
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explain  the  Own  relation  between  "Salinas"  and  "president,"  assuming  "president" 
represents  an  entity  distinct  from  NP2.  There  are  no  situations  in  which  a  president 
can  be  owned  by  another  person  unless  a  bribe  is  given,  and  that  condition  is  not 
justified  by  anything  in  sentence  5.  Thus,  in  sentences  4  and  5  it  is  the  cognitive 
model  of  creating  and  ownership  that  cause  MAKE_2  to  break  down. 

Now,  these  models  must  exist  in  the  database  in  order  for  these  inferences  to 
be  made.  Presumably  these  models  are  available  from  other  sentences  which  have 
already  been  analyzed.  Note  that  it  is  possible  to  contrive  an  extremely  elaborate 
set  of  conditions  in  which  the  phrase  "Mexico  made  Salinas  president"  could  match 
MAKE_2.  Suppose  citizens  of  Mexico  conspired  to  put  someone  through  Harvard, 
got  that  person  elected  president,  paid  the  person  a  bribe,  and  turned  control  of  the 
president  over  to  Salinas.  But  these  conditions,  if  true,  would  lead  to  all  sorts  of  new 
exceptions.  By  this  time,  the  point  has  already  been  made  through  the  Exception 
Condition  that  sentences  4  and  5  are  very  unusual  members  of  MAKE_2,  if  they  are 
members  at  all,  and  that  is  sufficient  to  activate  other  mechanisms  for  interpreting 
these  sentences. 
5.2.6   Exception  Handling  and  Schema  Evolution 

If  the  Exception  Condition  is  decided  in  favor  of  including  the  exception  into 
the  class,  then  the  class  must  be  modified  to  accommodate  the  exception.  Consider 
MAKE_1  given  in  the  previous  section  in  relation  to: 

1.  John  made  a  great  meal. 

2.  John  made  a  great  noise. 

3.  The  engine  made  a  great  noise. 

Sentence  1  fits  MAKE_1  perfectly,  yet  sentence  2  does  not  because  "noise"  is  not  a 
physical  entity.  However,  sentence  2  does  fit  the  causal  model  in  MAKE_1.  An  expla- 
nation can  be  generated.  The  mechanism  for  creating  noise  (shouting,  clapping,...) 
involving  the  subject  as  the  agent  can  be  produced.  The  same  holds  for  sentence  3. 
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Nevertheless,  the  exception  condition  is  raised  and  again  two  hypotheses  are  formed. 
First,  it  is  possible  to  modify  MAKE.l  to  allow  "noise"  as  NP2.  Second,  making  a 
noise  is  in  a  different  class.  The  modification  needed  in  MAKE_1  is  simple;  expand 
the  domain  of  OBJ  to  include  noise  and  expand  SUB  to  include  engines.  More  accu- 
rately, the  original  MAKE_1  is  retained,  but  new  versions  are  created  as  other  Class 
descriptions  to  accomodate  all  these  variations.  Even  after  only  these  few  examples, 
the  original  MAKE.l  has  evolved  into  several  different  but  closely  related  subclasses. 
5.2.7   Default  Reasoning 

Any  utterance,  whether  studied  in  isolation  as  in  these  examples  or  as  part  of 
an  involved  conversation,  is  subject  to  ambiguity.  Default  reasoning  is  a  powerful 
mechanism  for  reducing  ambiguity.  When  several  alternative  interpretations  of  an 
utterance  exist,  default  reasoning  can  be  used  to  rank  the  alternatives  and  select 
the  most  plausible.  Default  reasoning  is  necessary  because  in  most  every  utterance 
important  conditions  are  left  unspecified. 

Default  reasoning  is  based  on  a  probability  or  frequency  of  occurance.  Defaults 
are  obtained  by  reasoning  over  the  set  of  instances  as  described  in  Section  4.2.6.  A 
default  value  is  the  value  that  occurs  more  frequently  than  any  of  the  alternatives. 
Default  reasoning  effects  can  easily  be  illustrated  by  fill-in-the-blank  problems: 

1.  Robin robbed  from  the  rich  and  gave  to  the  poor. 

2.  Have  a  nice 

3.  Let  me  take  this to  welcome  you  here. 

The  answers  seem  obvious,  but  may  not  always  be  right.  The  answer  to  sentence  1 
is  "HUD."  This  sentence  appeared  in  a  newspaper  story  about  a  scandal  involving 
an  employee  in  the  Department  of  Housing  and  Urban  Development  in  which  funds 
were  diverted  to  the  poor. 

Recall  this  example  from  Section  5.2.5: 

1.  Modifications  made  the  project  economically  attractive. 

2.  Modifications  made  the  economically  attractive  project. 
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The  problem  was  to  avoid  interpreting  sentence  1  as  though  it  were  expressing  the 
same  idea  as  sentence  2.  The  interpretation  of  sentence  2  is  that  modifications  made 
the  project,  and  this  interpretation  fits  the  MAKE_1  class.  The  more  likely  inter- 
pretation for  sentence  1  is  that  modifications  changed  the  state  of  the  project  to  one 
of  being  economically  attractive.  This  is  correct  in  the  sense  of  being  the  default 
interpretation.  This  point  can  be  illustrated  with  one  of  the  steps  in  the  default 
reasoning  process.  One  reason,  it  was  argued,  that  sentence  1  does  not  have  the 
same  interpretation  as  sentence  2  is  because  to  do  so  requires  attaching  the  adjective 
phrase  "economically  attractive"  to  the  noun  phrase  "the  project"  as  a  post-modifier. 
Although  adjectives  can  be  used  as  post-modifiers,  such  use  is  very  rare.  This  is  es- 
pecially true  of  the  adjective  "attractive."  That  is,  we  say  "attractive  project"  and 
not  "project  attractive."  On  the  other  hand,  there  are  cases  when  a  post-modifier  is 
not  unusual.  This  is  the  case  with  "available."  Thus,  "all  cars  available"  sounds  just 
as  natural  as  "all  available  cars."  This  leads  to  a  genuine  ambiguity  with  regards  to 
"made"  in: 

The  company  made  all  cars  available. 
UnUke  sentence  1,  this  sentence  can  be  interpreted  by  MAKE.l  by  attaching  "avail- 
able" to  "cars."  The  default  reasoner  is  unable  to  reject  this  interpretation  as  it  could 
with  sentence  1. 

As  another  example,  consider  the  following  relationships  between  a  country  and 
a  president: 

1.  Mexico  made  a  president. 

2.  Mexico  elected  a  president. 

3.  Cuba  elected  a  president. 

Sentence  1  seems  odd  because  the  instances  in  which  a  president  is  spoken  about  as 
being  made  (that  is,  created)  by  a  country  are  rare.  On  the  other  hand,  there  are 
many  examples  of  sentence  2.  Sentence  3  would  make  headlines  since  it  is  both  rare 
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1.  Initialization 

2.  Recognition  of  a  New  Case 

.  Input  a  New  Phrase 

.  Word  Recognition 

.  Parsing 

.  Identify  Related  Cases  and  Models 

3.  Reasoning  Step 

.  Exact  Match 

.  Map  to  Related  Cases 

.  Map  to  Related  Models 

.  Rank  Ambiguous  Interpretations 

.  Dead  End 

4.  Validation  Step 

Figure  5.3:  Lexical  Acquistion  Algorithm 

and  also  violates  our  model  about  the  constitution  of  Cuba.  Thus,  although  all  three 
sentences  can  easily  be  understood,  they  illustrate  how  an  utterance  can  be  unusual 
on  the  basis  of  frequency  of  occurance.  When  two  interpretations  are  possible,  the 
unusual  interpretation  would  be  rejected.  Thus,  in  "Mexico  made  Salinas  president," 
the  interpretation  that  Mexico  had  manufactured  a  president  is  rejected  in  favor  of 
the  interpretation  that  it  had  elected  a  president. 

5.3   Lexical  Acquistion  Algorithm 

Given  the  procedures  illustrated  in  the  previous  section,  it  is  now  possible  to  state 
the  lexical  acquisition  algorithm.  The  algorithm  consists  of  the  steps  shown  in  Figure 
5.3.  The  algorithm  is  essentially  the  conceptual  clustering  algorithm  presented  in 
Chapter  4  modified  for  linguistic  data.  It  makes  use  of  the  same  procedures  and 
functions,  but  they  have  been  modified  as  described  in  the  previous  sections. 

The  lexical  acquisition  algorithm  is  not  intended  to  be  fully  automatic.  One 
reason  is  because  the  learning  techniques  are  based  on  having  a  pre-existing  lexicon 
with  which  to  compare  new  ccises.   Other  reasons  why  conceptual  clustering  cannot 
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be  fully  automated  are  given  in  Section  4.2.4.  Initially  the  job  of  creating  a  lexicon 
must  be  done  entirely  by  hand.  At  first  the  lexical  acquisition  algorithm  will  act  as 
an  assistant  to  the  lexicon  builders  by  suggesting  possible  relationships  for  approval 
by  the  builders.  The  algorithm  will  improve  with  exposure  to  more  cases,  making 
more  and  better  recommendations  as  the  lexicon  evolves.  The  algorithms's  ability  to 
recognize  new  phrases  will  continue  to  improve  with  experience.  This  performance  is 
illustrated  by  the  sample  data  presented  in  Section  5.4,  and  has  been  verified  by  the 
study  of  dozens  of  lexical  entries. 

The  lexicon  is  treated  as  part  of  the  database,  and  CANDIDE  is  used  to  represent 
linguistic  objects.  A  particular  lexical  entry  for  a  single  word  is  created  by  examining 
a  dataset  of  sentences  containing  the  word.  A  dataset  of  sentences  is  obtained  by 
scanning  a  corpus  of  text  from  various  sources.  Each  sentence,  more  often  a  phrase 
within  the  sentence  containing  the  word  under  study,  is  used  to  construct  a  case  rep- 
resented by  a  CANDIDE  instance.  Eventually  such  an  instance  contains  the  following 
information: 

1.  The  word 

2.  The  input  phrase  containing  the  word 

3.  Phonetic  Marking 

4.  Morphologic  Features 

5.  Syntactic  Features  (Parse  Tree) 

6.  Semantic  Structure 

This  must  include  enough  of  the  phrasal  environment  to  capture  all  the  components 
of  the  phrase,  sentence,  or  dialogue  that  impacts  the  use  of  the  word.  Although 
the  analysis  concentrated  mainly  on  phrasal  patterns  as  the  unit  of  study,  the  same 
techniques  would  apply  to  sentence  and  dialogue-level  domains  of  discourse.  Initially, 
only  the  word  and  input  phrase  is  available.  The  job  of  the  natural  language  processor 
is  to  build  each  case  by  adding  the  other  components. 

Although  the  algorithm  is  designed  to  work  on  four  levels,  namely  phonetic,  mor- 
phologic, syntactic,  and  semantic,  the  first  two  levels  can  be  bypassed  to  an  extent  by 
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using  whole  words  of  text  as  input.  Then  it  must  be  assumed  that  whole  words  are 
the  basic  lexical  unit.  The  syntactic  features  of  the  phrase  are  represented  by  DAG 
structures  as  described  in  Section  5.2.2.  The  semantic  features  are  represented  using 
standard  CANDIDE  instance  descriptions. 

C  AND  IDE  class  descriptions  are  used  for  cognitive  modeling.  When  these  are 
not  adequate,  other  specialized  modehng  languages  can  be  used.  Some  of  these  are 
described  in  Chapter  6.  The  resulting  lexicon  is  basically  a  portion  of  the  complete 
CANDIDE  database.  Cognitive  models  are  associated  with  classes.  Lexical  cases  are 
stored  as  instances.  The  lexical  acquistion  algorithm  based  on  conceptual  clustering 
builds  clusters  of  these  instances  and  creates  generalization  hierarchies.  The  result  is 
an  extensive  web  of  interrelationships  among  lexical  entries  and  phrasal  patterns  and 
between  these  entries  and  other  portions  of  the  database.  The  steps  of  the  lexical 
acquistion  algorithm,  shown  in  Figure  5.3,  are  described  below. 
5.3.1    Initialization 

In  learning  a  language,  a  human  being  has  a  tremendous  background  of  experience 
that  is  applied  to  understanding  word  usage.  A  human  can  record  the  contextual 
situation  in  which  utterances  are  made  by  using  all  five  senses.  Children  acquire  an 
enormous  memory  of  experiences  in  the  years  prior  to  learning  language.  All  this 
knowledge  is  applied  to  understanding  utterances. 

Unfortunately,  computers  do  not  yet  have  sensory  machinery  capable  of  gathering 
contextual  information  from  the  environment.  Learning  systems  that  use  case-based 
and  explanation-based  reasoning  require  an  extensive  preexisting  database.  Initializ- 
ing the  database  is  not  a  trivial  task.  In  such  systems,  for  example  CHEF  [41],  the 
initial  database  is  constructed  by  hand. 

The  lexical  acquisition  algorithm  cannot  begin  to  make  inferences  until  some  min- 
imal level  of  information  has  already  been  coded  into  the  database.  For  a  particular 
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word,  this  minimal  level  generally  consists  of  a  rudimentary  class  description.  It  is 
possible  to  build  a  general  purpose  grammar  with  good  coverage  containing  only  a 
few  hundred  rules,  so  these  can  be  created  and  entered  manually.  Likewise,  it  is  pos- 
sible to  construct  an  initial  lexicon  by  hand  containing  on  the  order  of  1000  words. 
These  initial  entries  will  not  likely  contain  much  information  for  each  entry.  This  is 
the  point  at  which  the  lexical  acquistion  algorithm  can  begin  to  be  of  assistance. 

Another  potential  initialization  method  is  the  use  of  machine- readible  dictionaries 
(MRD)  [11].  These  dictionaries  are  available  in  a  pre-formatted  fashion  (such  as  LISP 
notation)  that  can  be  analyzed  automatically  by  programs.  For  example,  Guo  [40] 
uses  a  MRD  to  create  a  semantic  network  of  concepts.  Such  a  network  could  provide 
an  initial  set  of  cases  and  models  for  each  word. 

Data  entry  is  facilitated  by  a  special  editor  that  provides  several  different  input 
modes.  In  one  mode,  database  objects  can  be  handcrafted  through  direct  manipula- 
tion by  using  a  graphical  browser  and  editor.  In  another  mode,  hand-formatted  text 
can  be  processed  in  bulk.  Formatting  codes  provide  information  that  can  be  used  to 
create  objects  directly.  In  a  third  mode,  text  phrases  can  be  input  using  words  the 
system  already  partially  understands.  The  system  suggests  possible  interpretations 
which  are  accepted  or  rejected  by  the  lexicographer. 
5.3.2   Recognition  of  New  Instance 

Input  a  New  Phrase.  The  recognition  process  begins  with  the  input  of  a  new 
phrase.  The  input  consists  only  of  a  string  of  text.  The  phrase  is  initially  unan- 
notated.  That  is,  no  phonetic,  morphologic,  syntactic,  or  semantic  information  is 
provided.  Annotated  text  input  can  be  used,  simplifying  the  procedure.  But  in  gen- 
eral, such  annotation  is  not  available.  The  algorithm  can  also  be  used  on  speech 
input,  in  which  case  the  speech  recognition  hardware  provides  phonetic  markings. 

Word  Recognition.  Word  recognition  is  first  based  on  phonetic  and  morphologic 
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analysis.  In  a  well-developed  system  with  a  large  vocabulary,  most  words  will  be 
recognized  immediately  from  a  vocabulary  list,  bypassing  phonetic  and  morphologic 
analysis.  Such  words  would  be  assigned  immediately  known  categories  (subject  to 
later  revision).  If  the  word  is  new  (perhaps  as  a  result  of  a  misspelling),  phonetic 
and  morphological  analysis  may  class  the  word  within  groups  of  other,  similar  words 
known  to  the  system.  Through  this  first  process  of  identification,  a  word  will  be 
placed  into  one  or  more  classes  of  related  words. 

Parsing.  Once  each  word  in  the  input  has  been  categorized  as  far  as  possible,  the 
parser  can  begin  syntactic  analysis.  One  or  more  prehminary  lexical  categories  may 
be  proposed  by  each  word  based  on  the  initial  categorization,  and  the  parser  uses 
each  of  them  in  a  non-deterministic  fashion  to  generate  all  possible  parse  trees  for  the 
input.  Words  which  are  completely  unknown  must  be  passed  to  the  parser  without 
any  constraints.  The  parser  can  proceed  based  on  the  available  information,  although 
multiple  interpretations  for  the  unknown  word  will  likely  result  [68]. 

At  this  stage  the  parser  operates  in  a  standard  fashion  using  one  of  the  fast  al- 
gorithms for  processing  the  context-free  portion  of  the  grammar  rules  (active  chart 
parsing  [122],  or  an  LR  parser  [117]).  DAG  unification  is  performed  in  conjunction 
with  the  parsing  [106].  The  parser  completes  analysis  to  the  syntactic  stage  and 
produces  one  or  more  parse  trees.  Each  parse  tree  must  be  analyzed  as  a  separate 
instance.  Each  parse  tree  also  produces  an  initial  surface  semantic  analysis  based 
on  default  argument  and  modifier  relationships  expressed  by  syntactic  patterns  as 
explained  in  Section  5.2.2.  The  information  generated  at  this  stage  is  only  prehmi- 
nary. Much  more  information  is  obtained  by  comparing  the  new  instances  to  existing 
instances  and  classes  in  the  database. 

Identify  Related  Cases  and  Models.  The  parser  produces  new  instances,  each  con- 
taining a  prehminary  analysis  of  phonetics,  morphology,  syntax,  and  semantics  for 
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each  parse  tree  generated  from  the  input.  Each  new  instance  can  now  be  compared 
with  other  instances  and  models  in  the  database.  The  next  step  is  to  retrieve  all 
existing  instances  and  classes  that  may  be  related  in  some  way  to  the  new  instance. 
Relevant  classes  and  instances  are  retrieved  by  applying  Realization  and  Intersection 
to  categorize  the  new  instance  as  was  described  in  Section  4.2.3. 
5.3.3   Reasoning  Step 

Exact  Match.  If  the  new  instance  fits  an  existing  instance  exactly,  any  additional 
features  associated  with  the  existing  instance  are  automatically  transfered  to  the  new 
instance,  and  the  recognition  process  is  completed. 

Map  to  Related  Instances.  An  instance  may  be  added  to  an  existing  class  through 
a  Realization  process  (see  Section  4.2.3.1).  It  then  automatically  acquires  the  defaults 
associated  with  that  class.  On  the  other  hand,  Intersection  may  produce  a  new 
class  that  raises  the  Exception  Condition.  This  causes  the  case  to  be  split  into  two 
hypotheses  resulting  from  the  decision  to  add  the  new  case  to  the  existing  class,  or  to 
create  a  new  class  (see  Section  4.2.4).  If  the  new  case  is  added  to  the  existing  class, 
the  class  structure  must  be  modified  according  to  the  schema  evolution  procedures 
described  in  Section  4.2.5.  The  modified  schema  results  in  a  set  of  defaults  which  can 
then  be  apphed  to  the  new  instance.  If  instead  a  new  class  is  generated,  there  will  be 
no  additional  default  values  until  more  instances  are  apphed  to  the  new  class. 

Apphcation  of  default  values  is  done  automatically  whenever  a  new  case  is  assigned 
to  an  existing  class,  and  the  class  contains  default  values  which  do  not  conflict  with 
values  already  specified  in  the  new  instance.  Default  values  are  obtained  by  the 
procedure  given  in  Section  4.2.6. 

Map  to  Related  Models.  If  the  existing  instances  fail  to  match  the  new  instance, 
then  it  is  possible  that  one  of  the  existing  models  may  be  apphed  to  account  for  the 
new  instance.  A  number  of  techniques  may  be  used  at  this  step.  These  are  essentially 
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the  techniques  of  explanation-based  learning  (Section  3.1.3.5).  They  depend  on  the 
ability  to  locate  a  suitable  existing  model  and  to  incrementally  alter  the  model  if 
necessary  to  fit  the  new  case. 

The  first  step  is  to  attempt  to  apply  an  existing  model  without  modification. 
This  is  done  by  mapping  the  surface  semantic  interpretation  produced  by  the  parser 
to  the  structure  of  the  model.  The  predicate-attribute  and  head- modifier  relationships 
idenitified  by  the  parser  establish  a  correspondence  between  entities  that  is  exphcitly 
defined  by  the  structure  of  a  model.  For  example,  the  noun  phrase  "red  block" 
consists  of  a  modifier-head  relationship  in  which  the  modifier  "red"  is  associated  with 
the  head  "block."  The  model  for  physical  objects  associates  "red"  with  the  object 
"block"  through  the  "color"  attribute.  Thus,  the  noun  phrase  is  mapped  to  a  model 
for  physical  objects. 

The  easiest  model  adaptation  procedure  is  known  as  concept  specialization  [82]. 
An  existing  model  schema  is  adapted  to  a  new  case  by  making  a  component  of  the 
model  more  specific.  First,  it  is  necessary  to  determine  which  component  is  relevant 
and  then  to  determine  how  the  component  must  be  modified.  For  example,  to  un- 
derstand a  new  phrase  "bird  house,"  it  is  necessary  to  modify  the  "house"  schema. 
The  "house"  schema  contains  many  attributes,  one  of  which  is  "occupant."  Although 
"occupant"  may  be  restricted  to  the  domain  of  "people,"  it  is  possible  for  a  reasoning 
system  to  generalize  the  domain  to  "animal"  by  observing  this  relationship  between 
"people"  and  "birds."  Then  the  schema  for  "house"  is  specialized  to  "bird  house"  by 
restricting  the  domain  of  "occupant"  to  "bird." 

A  variety  of  heuristic-based  techniques  can  also  be  used  to  modify  a  model. 
SWALE  uses  "tweaking  rules"  to  explain  various  discrepencies  between  a  case  and  a 
model  [103,12].  Many  of  these  rules  are  indexed  by  specific  failures  which  occur  when 
applying  a  model.   For  example,  a  situation  where  an  attribute  has  the  wrong  filler 
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causes  particular  rules  to  be  activated  that  search  for  a  modification  that  accounts 
for  the  wrong  filler.  Much  work  remains  in  specifying  and  formalizing  procedures  for 
altering  models.  At  present,  it  is  very  difficult  to  generate  entirely  new  models  to 
account  for  observed  cases.  This  is  an  area  for  automated  discovery  [62]. 

Rank  Ambiguous  Interpretations.  In  most  cases,  interpretation  of  a  new  instance 
will  not  be  entirely  unambiguous.  When  several  interpretations  are  consistent  with 
the  existing  database,  it  is  possible  to  create  a  ranking  based  on  default  reasoning 
(see  Section  4.2.6).  This  will  also  produce  a  most  likely  interpretation. 

Dead  End.  It  is  possible  and  natural  that  a  new  instance  fails  to  satisfy  any  of 
these  reasoning  procedures,  and  thus  there  will  not  be  any  consistent  interpretation. 
Here  the  system  fails  to  recognize  the  new  instance  and  cannot  proceed  any  further 
without  additional  information. 
5.3.4   Validation  Step 

In  many  situations  a  new  instance  merely  generates  several  hypotheses  rather 
than  definite  interpretations.  This  happens  when  the  Exception  Condition  is  raised. 
Although  these  hypotheses  can  be  ranked  by  using  default  reasoning,  there  can  never 
be  certainty  that  a  hypothesis  is  correct. 

The  standard  procedure  for  verifying  a  hypothesis  is  for  the  system  to  wait  until 
additional  information  is  available.  One  way  to  do  this  is  to  wait  for  new  cases  which 
may  be  used  to  confirm  or  reject  a  hypothesis.  Another  way  is  through  interaction 
with  the  user.  An  erroneous  hypothesis  can  be  quickly  confirmed  or  denied  through 
dialogue.  For  example,  an  erroneous  hypothesis  will  result  in  the  system  using  a  word 
incorrectly.  The  user  can  then  quickly  correct  the  mistake.  Clearly  this  process  is 
very  common  among  language  speakers.  Unfortunately,  it  further  implies  that  a  fully 
automatic  lexical  acquisition  procedure  is  not  possible. 
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5.4   Example 

A  "family  portrait  "  shown  in  Figure  5.4  illustrates  a  conceptual  clustering  of  the 
various  uses  of  the  word  "soil"  described  in  Section  5.1.4. 

The  lexical  acquisition  algorithm  can  be  appUed  to  new  phrases  that  do  not  quite 
match  these  existing  phrases.  This  series  of  examples  illustrates  how  the  clustering 
algorithm  is  apphed  to  lexical  acquisition  for  the  word  "soil."  First  consider  a  simple 
example: 

soil  fertility 
With  no  knowledge  about  the  word  "fertility,"  the  cases  from  class  (soil)  (WORD)  are 
retrieved  as  matching  this  phrase  syntactically.  The  "ty"  ending  of  "fertility"  indi- 
cates that  fertility  is  a  property.  This  conclusion  comes  from  the  cases  which  match 
"fertility"  morphologically.  This  restricts  the  cases  to  class  (soil) (PROPERTY): 

soil  type 
soil  water 
soil  pH 
soil  conditions 

Applying  the  default  semantic  interpretation  from  this  class  results  in: 

(soil  (fertility:  ?)) 
where  "fertility"  has  been  treated  as  an  attribute  of  "soil." 
Next  consider  the  phrase: 

surface  of  the  soil 

Knowledge  about  "surface"  is  already  available  from  the  case  "soil  surface."    The 

cases  that  match  the  new  phrase  syntactically  are: 

top  one  to  two  inches  of  soil 
large  amounts  of  soil 
conditioning  of  soil 
formation  of  soil 

Using  the  default  semantic  interpretation  for  these  phrases,  the  new  phrase  is  inter- 
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"soil  as  entity" 
<noun>  of  soil 

<quantity>  soil 


,  <mod>soil 


soil 


\"soil  water" 


<soil-type>  soil 
<problem>  soil  ■ 
'<drainage>  soil 


'soil  <noun> 


soil  <inhabitant> 
^soil  <pesticide> 


lift  the  tree  from  the  soil 
covered  with  soil 

large  amounts  of  soil 

conditioning  of  soil 
formation  of  soil 

top  one  to  two  inches  of  soil 
surrounding  soil 
top  soil 

-  potting  soil 

excessive  soil 

moist  loamy  soil 
rich  fibrous  soil 
rich  moist  soil 

contaminated  soil 
infested  soil 

well  drained  soil 

excessively  well  drained  porous  soil 

low  soil  moisture 

soil  sample 
soil  type 

soil  water 

soil  Ph 

soil  characteristics 

soil  conditions 

soil  moving  equipment 
soil  surface 

soil  dweller 
soil  insect 

soil  fumigant 

soil  applied  herbicide 


Figure  5.4:  Family  Portrait  of  the  Word  "soil" 
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preted: 

(surface  (physicaLentity:  soil)) 

which  matchs  the  case  "soil  surface"  semantically. 

Next  consider: 

garden  soil 

The  matching  cases  are  of  the  form  (  modifier  )(  soil  ): 

surrounding  soil 
potting  soil 
top  soil 

moist  loamy  soil 
rich  fibrous  soil 
rich  moist  soil 

Without  any  knowledge  of  the  word  "garden,"  it  is  not  possible  to  analyze  the  new 

case  any  further  than  marking  "garden"  as  a  modifier.    With  basic  knowledge  of 

"garden"  as  a  noun,  it  is  possible  to  ehminate  all  but  the  "(  NP  )(  soil  )  cases,  but 

this  still  results  in  two  different  hypothesis,  namely  "garden"  as  a  location  or  "garden" 

as  a  purpose  for  the  soil.    A  cognitive  model  of  "garden"  is  needed  which  includes 

1)  soil  is  part  of  a  garden,  and  2)  gardening  is  one  use  of  soil.    This  is  analogous 

to  the  model  for  "potting,"  and  matches  the  case  "potting  soil"  would  lead  to  the 

interpretation: 

(soil  (purpose:  gardening)) 
Now  consider: 

poor  soil  drainage 

Similar  cases  that  will  be  retrieved  are: 

soil  type 

soil  water 

soil  characteristics 

soil  fertility 

well  drained  soil 

excessively  well  drained  porous  soil 

low  soil  moisture 
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The  (  soil )(  property  )  cases  would  be  retrieved  based  on  the  morphology  of  "drainage," 
which  suggests  that  "drainage"  is  a  characteristic.  Cases  containing  "drained"  would 
also  be  retrieved  because  of  the  relationship  with  the  root  word  "drain."  Finally 
"low  soil  moisture"  matches  the  phrase  syntactically  and  semantically.  Combining 
the  interpretations  from  these  cases,  results  in  the  analysis: 

(soil  (drainage:  poor)) 
That  is,  "drainage"  is  interpreted  as  a  property,  as  was  done  in  the  analysis  of  "soil 
fertility."    The  trick  is  to  determine  that  "poor"  modifies  "drainage"  rather  than 
"soil."  An  alternative,  feasible  interpretation  would  be: 

(soil  (quality:  poor)  (drainage:  ?)) 
where  "poor"  refers  to  the  type  of  soil,  rather  than  the  type  of  drainage.  Default 
reasoning  over  the  cases  gives  preference  to  the  first  interpretation,  because  of  the 
existing  phrases  relating  "well"  with  "drained,"  and  the  case  "low  soil  moisture" 
which  (presumably)  relates  "low"  with  "moisture"  rather  than  "soil."  This  is  not  to 
say  the  second  interpretation  is  not  possible  under  the  right  conditions,  but  rather 
the  first  interpretation  is  the  default  interpretation. 
Finally,  consider  a  radically  different  phrase: 

The  boy  soiled  his  clothes. 
This  phrase  fails  to  match  any  of  the  cases  since  "soil"  never  appears  as  a  verb. 
Even  using  the  standard  (  NP  )(  VERB  )(  NP  )  template,  it  is  not  possible  to  relate 
"boy"  or  "clothes"  as  arguments  to  "soil"  to  any  of  the  existing  cases.  Without  any 
additional  information,  this  ca^e  cannot  be  analyzed  semantically  other  than  with  the 
association  of  Subject  and  Object.  This  phrase  would  probably  need  to  be  learned 
from  context  of  observing  a  boy  with  soiled  clothes  and  then  seeing  this  phrase. 

Cases  other  than  those  containing  "soil"  may  be  available.    For  example,  there 
should  be  a  close  clustering  of  phrases  containing  "soil"   with  phrases  containing 
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"dirt,"  and  from  there  it  would  be  possible  to  relate  to  cases  such  as  "dirty  clothes." 

Also,  a  morphological  analysis  of  "soiled"  would  retrieve  cases  such  as: 

he  watered  the  garden 
he  aired  the  room 
he  potted  the  plant 
he  painted  the  picture 
he  wallpapered  the  room 

These  phrases  contain  verbs  obtained  from  a  root  form  which  is  a  noun.  The  noun 

names  an  entity  which  is  itself  transferred  to  the  direct  object  as  part  of  the  action 

of  the  verb.  Since  "soiled"  acts  as  just  such  a  verb,  it  is  possible  to  conclude  that  soil 

is  somehow  being  transferred  to  the  direct  object  of  the  new  case. 

In  these  examples,  the  system  is  assumed  to  have  little  or  no  knowledge  about 

other  words  appearing  in  the  phrase  besides  "soil."   Each  example  showed  the  least 

amount  of  knowledge  of  other  words  needed  to  understand  the  phrase.  In  actuahty, 

the  system  would  have  much  more  knowledge,  since  clusters  would  be  formed  for  all 

of  the  words  in  the  vocabulary.  Lexical  acquisition  is  an  incremental  process. 

5.5    Summary 

In  this  chapter,  the  category  theory  of  word  meaning  presented  in  Chapter  3 
was  applied  to  natural  language  processing  by  using  the  formal  conceptual  clustering 
techniques  presented  in  Chapter  4.  A  new  perspective  on  computational  linguistics 
results  in  which  language  understanding  is  based  on  a  large  corpus  of  utterances  stored 
in  a  case-based  memory.  Lexical  entries  are  based  on  categories  which  group  together 
related  uses  of  a  word.  The  resulting  categories  are  far  richer  in  information  content 
than  previous  lexicon  designs.  The  INTERSECT  function  is  used  to  map  previous 
utterances  to  interpret  a  new  utterance.  The  Exception  Condition  is  used  to  identify 
lexical  gaps.  Default  reasoning  provides  an  important  facihty  for  disambiguation.  The 
application  of  these  techniques  in  the  lexical  acquisition  algorithm  results  in  a  natural 
language  processor  which  can  attempt  to  interpret  new  language  usage  patterns. 


CHAPTER  6 
NATURAL  LANGUAGE,  COGNITIVE  MODELS,  AND  QUALITATIVE 

SIMULATION 

6.1    Introduction 


Models  used  in  qualitative  simulation  are  suitable  for  use  as  formal  cognitive 
models  such  as  those  involved  in  representing  language  meaning.  A  series  of  examples 
explores  the  mapping  between  natural  language  expressions  and  formal  models  used 
in  computer  simulation.  A  theoretical  representation  of  categories  and  word  meanings 
is  presented  in  which  cognitive  models  play  an  important  role.  The  examples  illustrate 
the  use  of  a  model  in  reasoning  and  discourse,  the  expression  of  temporal  relationships, 
verbal  descriptions  of  mathematical  expressions,  and  the  generation  of  qualitative 
descriptions  of  model  behavior.  This  work  can  be  apphed  in  the  process  of  software 
engineering  as  natural  language  specifications  are  transformed  into  models,  or  model 
results  are  interpreted  and  reported  by  natural  language  generators.  Furthermore, 
models  of  various  kinds  are  necessary  in  systems  that  use  language. 

Cognitive  models  and  methods  in  qualitative  simulation  [28]  are  related  when 
the  benefits  that  each  can  offer  the  other  are  considered.  Coarse-grained  simulation 
models  offer  the  benefits  of  reduced  complexity  and  an  increase  in  model  comprehen- 
sibility,  especially  when  hierarchically  organized  with  models  of  a  finer  grain  (such  as 
equational  models).  A  particular  model  serves  to  answer  a  certain  class  of  questions. 
For  instance,  a  simple  finite  state  machine,  where  states  are  linguistic  concepts,  can 
serve  as  a  crude  simulation  model  as  long  as  the  results  that  can  be  successfully 
obtained  by  simulating  that  model  are  suitable  —  in  other  words,  the  "bounds"  of 
what  the  model  is  capable  of  representing  cannot  be  exceeded.   If  one's  question  to 
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a  system  is  "Is  the  lathe  used  after  the  part  cleaning  process,"  (in  a  manufacturing 
domain)  then  an  automaton  or  model  in  temporal  logic  may  be  sufficient.  These 
models,  though,  are  inadequate  when  used  to  try  to  answer  questions  such  as  "What 
is  the  rotation  speed  three  seconds  after  machine  startup?" 

Natural  language  (NL)  serves  as  an  excellent  starting  point  when  considering 
coarse-grained  model  structures;  many  system  descriptions,  problems,  and  answers 
are  often  stated  in  terms  of  natural  language.  The  chief  problem  with  natural  lan- 
guage, as  a  simulation  modeling  language,  relates  to  the  lexical  ambiguities  and  in- 
complete knowledge  associated  with  natural  language  models.  However,  this  is  not 
a  reason  to  despair  and  dismiss  the  study  of  natural  language  models.  People  will 
continue  to  think  and  write  system  descriptions  in  natural  language  whether  or  not 
there  exists  software  to  manage  NL  models.  It  is  logical  that  our  everyday  language  of 
choice  will  serve  as  a  vehicle  through  which  many  rough  simulation  model  structures 
(at  least  at  the  early  stages  of  model  development  [29])  are  expressed. 

In  this  chapter  the  important  role  of  cognitive  models  in  forming  conceptual  cat- 
egories of  NL  expressions  is  discussed.  It  is  suggested  that  qualitative  simulations 
would  be  generated  as  a  particular  kind  of  cognitive  model.  Natural  language  expres- 
sions have  much  in  common  with  simulation  models.  A  sentence  contains  constituent 
phrases  each  expressing  an  idea  (submodel),  and  the  phrases  are  connected  by  syntax 
(submodel  interaction).  One  could  claim  that  the  language  of  mathematics  is  clear, 
precise,  and  unambiguous,  but  it  too  has  a  grammar  and  its  symbols  are  given  mean- 
ing through  conventions  set  by  those  using  the  language.  Conventions  for  algebraic 
expressions,  for  instance,  take  the  form  of  grammars  and  symbols  with  well-defined 
semantics  necessary  for  calculating  the  resulting  value  of  an  expression.  Language 
expressions  undergo  transformation  [30].  For  example,  English  sentences  are  trans- 
formed into  internal  representation  (semantic  structures).   One  such  transformation 
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could  be  into  a  formal  qualitative  or  quantitative  model. 

It  is  suggested  that  qualitative  models  can  be  used  as  cognitive  models  for  rep- 
resenting natural  language  meaning  and  reasoning.  Qualitative  models  are  special 
cases  of  cognitive  models.  Several  examples  are  presented  in  the  next  section  in  order 
to  illustrate  the  role  of  qualitative  models  in  language  understanding. 

The  evolution  of  category  theory  can  be  briefly  summarized  as  follows: 

1.  The  classical  view  specifies  necessary  and  sufficient  conditions  for  category  mem- 
bership. 

2.  Similarity  and  case-based  reasoning  build  categories  by  comparing  instance  de- 
scriptions. 

3.  Cognitive  models  are  needed  to  describe  instances  in  the  first  place  and  deter- 
mine what  similarity  is  important. 

4.  The  concept  of  cognitive  model  needs  to  be  formalized.  There  is  in  fact  a  large 
number  and  diversity  of  models  on  any  particular  subject. 

The  organization  of  memory  is  characterized  by  a  large  number  of  models  and 
instances.  Models  "compete"  through  their  ability  to  explain  the  characteristics  and 
similarities  of  instances.  Still  there  will  always  be  one  or  more  instances  that  are 
exceptions  to  a  particular  model.  New  models  are  generated  to  explain  the  exceptions, 
but  this  is  a  never  ending  process. 

Notice  the  relevance  of  these  points  to  simulation.  Item  1  above  is  violated  when- 
ever it  is  supposed  that  there  is  just  one  model  of  the  domain  and  that  it  completely 
describes  the  domain.  Item  2  is  the  system  identification  process.  Even  the  collection 
of  empirical  data  must  conform  to  Item  3  since  the  data  would  have  no  meaning 
without  a  model  of  how  the  data  is  collected  and  a  description  of  what  the  numerical 
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values  mean.  Much  work  is  needed  for  Item  4,  and  it  is  proposed  that  qualitative 
models  may  address  part  of  the  need. 

In  the  remainder  of  this  chapter,  a  number  of  qualitative  models  will  be  examined 
to  illustrate  the  close  connection  between  natural  language  and  models.  The  purpose 
of  this  section  was  to  illustrate  general  issues  involved  in  representing  language  mean- 
ings. The  diversity  in  language  usage  requires  a  corresponding  diversity  in  models. 
There  remains  much  to  be  done  in  formalizing  the  notion  of  cognitive  model  and 
building  systems  capable  of  managing  the  resulting  large  scale  memory  organization. 

6.2   Example  -  "GROW" 

A  simple  example  of  a  model-based  approach  to  representing  word  meaning  is 
shown  in  Figure  6.1  which  contains  two  models  for  the  word  "grow."  Add  to  these 
simple  models  additional  relationships  for  time  and  causaUty  and  a  good  deal  more 
detail,  and  it  is  possible  to  represent  more  complex  types  of  models  suitable  for 
simulation. 

These  simple  models  capture  aspects  of  the  event  nature  and  dynamic  nature  of 

the  concept  "grow."    In  Figure  6.1a,  "grow"  is  a  type  of  Action  which  implies  that 

it  takes  place  at  a  particular  time  and  location.  There  may  be  an  Agent  involved  in 

the  growing  and  (in  the  transitive  form  of  the  verb)  an  Object  that  is  being  grown. 

For  example,  "A  person  grows  a  plant."  This  model  will  suffice,  and  indeed  would  be 

required,  for  understanding  phrases  such  as: 

grow  actively 

grow  tall 

grow  in  the  field 

grow  flowers  in  clusters 

grow  best 

Since  this  model  inherits  relationships  of  Actions,  it  can  also  account  for  phrases  such 

as: 

grow  in  the  spring 
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Grow 

ISA:  Action 

Agent:  <Person,Plant> 

Object:  <Plant> 

Height:  <Numerical  Value, Short,Tall> 

Pattern:  <Cluster,Row,Alone> 

Rate:  <Slow,Fast> 

Quality:  <Good,Bad> 

(la) 


Mechanism:  Birth ►Immature ►Mature ►Death 

t I 

(lb) 


Figure  6.1:  Two  Models  for  Representing  the  Word  "Grow"  a)  "Grow"  as  an  Action 
b)  "Grow"  as  a  Dynamic  Process 
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which  introduces  an  inherited  temporal  aspect. 

A  crude  description  of  the  mechanism  of  growing  captures  dynamic  relations  (Fig- 
ure 6.1b).  The  mechanism  model  is  needed  to  interpret  phrases  that  refer  to  dynamic 
processes  such  as: 

seedlings  grow  into  trees 
grow  to  maturity 
grow  up 

What  is  more  interesting  is  the  ways  in  which  this  model  fails.  The  model  fails  to 
account  for  phrases  such  as: 

grow  new  roots 
To  accept  this,  the  domain  of  the  Object  attribute  of  "grow"  (Figure  6.1a)  must  be 
diluted  to  include  plant  parts  in  addition  to  plants.    This  dilution  effect,  that  the 
model  becomes  less  specific  as  more  instances  are  encountered,  is  a  clear  result  of  the 
family  resemblance  principle. 

Another  failure  is  illustrated  by: 

grow  tired 
grow  hungry 

These  metaphors  can  be  understood  by  extending  existing  cases  (<become  tired >, 

<grow  larger >)  and  mapping  to  a  modified  version  of  the  existing  dynamic  model 

(growth  is  a  dynamic  process  of  gradual  change  from  state  to  state).  For  example: 

the  lights  grow  dimmer 
*the  lights  grow  out 

The  reason  the  second  phase  sounds  awkward  is  because  it  does  not  describe  a  gradual 

process. 

With  additional  instances  of  "grow,"  the  complex  cluster  of  meanings  begins  to 

emerge.   A  "family  portrait"  for  "grow"  is  outlined  in  Figure    6.2.   These  instances 

were  taken  from  a  corpus  of  text  comprised  of  technical  literature  on  ornamental 

plants.  The  analysis  is  conducted  along  the  guidelines  presented  in  Chapter  5.  Overall 
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grow 


Size 


pattern 


rate 


location 


grow  over  10  feet  tall 
grow  larger  than  25  feet 

grow  to  be  4  inches 

grow  several  feet  long 

grow  in  clusters 
grow  by  itself 

grow  rapidly 
grow  fast 
grow  vigorously 

grow  poorly 

grow  in  tropical  climates 
grow  in  Rorida 


grow  inland 

Figure  6.2:  A  Family  Portrait  Showing  Various  Usages  of  the  Word  "Grow" 

the  instances  shown  in  the  figure  have  little  in  common.  Yet  the  instances  are  not 
mutually  exclusive.  Instances  are  grouped  together  by  their  similarity.  They  are 
similar  in  the  precise  sense  that  they  have  some  common  structural  features.  For 
example,  the  first  cluster  of  instances  describes  growth  in  excess  of  a  particular  height. 
6.3   Language  Descriptions  Involving  Spatial  Reasoning 

This  example  shows  how  a  qualitative  model  is  needed  both  in  understanding 
and  reasoning  about  a  paragraph  describing  a  simple  physical  system.  The  following 
description  of  a  sundial  is  taken  from  a  children's  book  on  elementary  science  [111]. 
As  in  all  the  examples,  considerable  prior  knowledge  about  the  physical  system,  in 
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this  case  knowledge  about  movement  of  the  sun  and  shadows,  is  needed  to  understand 

the  passage.  These  two  sentences  describe  the  components  of  the  sundial: 

"Sundials  are  flat,  circular  plates  marked  off  into  hours.    A  metal  stick, 
called  a  gnomon,  points  towards  the  North  Pole." 

It  is  not  clear  that  these  sentences  can  be  understood  without  a  visual  image  of 
the  sundial  (which  is  provided  in  the  book).  The  following  structural  description  can 
be  built  from  the  literal  interpretation  of  the  sentences  by  using  a  natural  language 
processor.  The  exact  relationship  between  the  gnomon  and  plate  is  unclear: 

Sundial 

Components:  Plates 

Shape:  Flat,  Circular 
Markings:  Graduated 

Unit:  Hours 
Gnomon 

Shape:  Stick 
Material:  Metal 
Orientation:  Point  To 

Entity:  North  Pole 

Since  this  example  rehes  heavily  on  visualization,  a  representation  which  captures 
the  spatial  aspects  must  be  used.  For  this,  an  image-schema  notation  developed  by 
Lakoff  [61]  can  be  used.  The  image-schema  based  only  on  the  literal  interpretation 
of  the  first  two  sentences  (Figure  6.3)  describes  the  metal  plate  and  the  gnomon 
pointing  to  the  North  Pole.  The  precise  relation  between  the  gnomon  and  plate  is 
ambiguous. 

The  next  sentence  elaborates  both  the  structural  and  dynamic  components: 

"When  the  sun  shines  on  the  sundial,  the  gnomon  makes  a  shadow  on  the 
plate." 

Now  there  is  a  constrained  relationship  between  the  gnomon,  plate,  and  sun  dictated 
from  knowlege  of  shadows: 
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NORTH  POLE 


GNOMON 


Figure  6.3:  Sundial  Version  1  -  Components 

Shadows 

Light  Source:  Sun 
Block:  Gnomon 
Falls  On:  Sundial 

And  the  visual  image  corresponds  to  Figure   6.4. 

Now  the  time- varying  component  is  created  by  the  earth's  rotation: 

"As  the  earth  turns,  the  shadow  moves  from  one  mark  to  the  next  on  the 
plate,  telling  what  hour  it  is." 

Understanding  this  requires  the  causal  connection  between  the  earth's  rotation,  the 
angle  of  the  sun,  and  position  of  shadows.  This  can  be  expressed  in  a  mathematical 
relationship: 

M{Earth)  =  Ci  ■  M{sun)  =  C2  ■  Ae{shadow) 
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NORTH  POLE 


Figure  6.4:  Sundial  Version  2  -  Introducing  the  Shadow 
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However,  this  is  not  likely  to  be  the  way  small  children  would  reason  about  such  a 
phenomenon.  Rather,  a  time-varying  component  can  be  added  to  the  visual  schema. 
Namely,  a  change  in  the  position  of  the  earth  causes  a  change  in  the  position  of  the 
sun,  which  causes  a  change  in  the  position  of  the  shadow  (Figure   6.5). 

Thus  far  the  qualitative  model  has  been  used  to  interpret  the  sentences.  The  proof 
lies  in  being  able  to  reason  by  using  the  model: 

"Sundials  were  helpful,  but  they  could  not  be  used  on  cloudy  days  or  at 
night..." 

So  why  can't  sundials  be  used  on  cloudy  days  or  at  night?  The  explanation  can  be 
derived  from  the  image-schema.  Superimposing  the  image-schema  for  cloudy  days,  or 
by  running  the  model  until  sunset,  results  in  a  breakdown  of  the  mechanism  shown 
in  Figure  6.5.  Without  the  image-schema,  an  explanation  would  be  difficult  to 
obtain.  Interestingly,  the  text  explicitly  stated  this  explanation,  indicating  that  small 
children  should  not  be  expected  to  understand  such  advanced  reasoning.  The  passage 
concludes: 

"...because  there  weren't  any  shadows  then!" 

6.4   Models  Capturing  Temporal  Relationships 

This  example  attempts  to  represent  temporal  events,  not  with  mathematical  re- 
lationships, but  with  qualitative  relationships  expressed  in  a  natural  language  text. 
There  is  a  considerable  amount  of  work  on  temporal  reasoning  that  can  be  used  for 
this  purpose  [2,81].  In  this  example,  statements  about  properties  and  time  appearing 
in  a  natural  language  text  description  of  time- varying  events  are  mapped  to  a  simple 
model  of  the  temporal  relationships  among  the  events.  Properties  associated  with 
the  entities  participating  in  these  events  are  also  extracted  and  represented  explic- 
itly along  with  the  temporal  information.  This  could  be  the  first  stage  in  building  a 
mathematical  model. 
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NORTH  POLE 


Figure  6.5:  Sundial  Version  3  -  Dynamic  Relationships 
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The  analysis  is  performed  on  the  following  paragraph  describing  the  life  history 

of  a  parasite  ( Tetrastichus  julis)  that  attacks  an  insect  pest  (cereal  leaf  beetle)  that 

feeds  on  grain  crops  [35]: 

"Adults  parasitize  cereal  leaf  beetle  larvae  feeding  on  the  leaves  of  small 
grains.  Late  instar  larvae  of  T.  julis  overwinter  in  the  soil  within  cereal 
leaf  beetle  pupal  cells  formed  in  late  June.  An  average  of  5  parasite  larvae 
can  be  found  within  each  pupal  cell.  In  late  May  T.  julis  larvae  complete 
their  development  and  the  adults  chew  through  the  pupal  cell  and  make 
their  way  to  the  soil  surface  where  they  mate  and  disperse  to  grain  fields. 
At  this  time,  T.  julis  can  be  seen  searching  the  upper  surface  of  spring 
grains  for  cereal  leaf  beetle  larvae." 

It  is  possible  for  a  natural  language  processor  to  analyze  this  text  and  construct 
representations  for  both  factual  and  temporal  information  directly  from  the  words 
and  grammatical  structure  of  each  sentence.  For  example,  the  first  sentence: 

"Adults  parasitize  cereal  leaf  beetle  larvae  feeding  on  the  leaves  of  small 

grains." 

can  be  analyzed  at  a  syntactic  level,  resulting  in  a  parse  tree  (Figure   6.6). 

The  surface  semantics  of  this  sentence  is  represented  by  a  predicate  constructed 

by  the  language  processor: 

Parasitize( Adult  T.  julis, 

Larvae  [Cereal  leaf  beetle, 

Feeding[Leaves[Grain[Small]]]] ) 

The  main  verb  "Parasitize"  is  the  main  predicate,  and  the  Subject  "Adults"  and 
Object  "Larvae"  are  arguments.  The  modifiers  (in  brackets)  "Cereal  leaf  beetle," 
and  "feeding  on  the  leaves  of  small  grains"  are  structurally  attached  to  the  object. 
The  reference  that  "Adults"  refers  to  the  adults  of  "  T.  julis'"  must  be  inferred  from 
the  context  of  the  paragraph. 

The  predicate  is  used  to  instantiate  the  generic  concept  "Parasitize."  Prior  knowl- 
edge of  this  concept  is  needed  to  interpret  the  sentence.  The  generic  concept  is  rep- 
resented by  the  object: 
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SENTENCE 

NOUN  PHRASE 

NOUN  -  Adults 
VERB  PHRASE 

VERB  -  Parasitize 
NOUN  PHRASE 
NOUN  PHRASE 

PROPER  NOUN  -  Cereal  leaf  beetle 
NOUN  -  Larvae 
GERUND  PHRASE 
GERUND  -  Feeding 
PREPOSITIONAL  PHRASE 
PREPOSITION  -  On 
NOUN  PHRASE 
NOUN  PHRASE 

DETERMINER  -  The 
NOUN  -  Leaves 
PREPOSITIONAL  PHRASE 
PREPOSITION  -  Of 
NOUN  PHRASE 

ADJECTIVE  -  Small 
NOUN  -  Grains 

Figure  6.6:  Parse  Tree  for  "Adults  parasitize  Cereal  Leaf  Beetle  larve  feeding  on  the 
leaves  of  small  grains." 
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Parasitize 

SUPERCLASS:  Action 
ATTRIBUTES 

Parasite:  ATLEAST  Organism 
Host:  ATLEAST  Organism 

Parasitize  is  an  Action  involving  a  Parasite  and  a  Host,  both  of  which  are  Organisms. 
Given  this  object,  the  action  of  parasitism  expressed  in  the  first  sentence  can  be  used 
to  form  the  following  instance: 

Parasitize 

SUPERCLASS:  Action 
ATTRIBUTES 

Parasite:  Adult  T.  julis 
Host:  Cereal  leaf  beetle 
ATTRIBUTES 

Growth  Stage:  Larva 
Habitat:  Small  Grains 
Feeding_Site:  Leaves 

Information  about  cereal  leaf  beetle  larvae  is  also  embedded  within  this  object  and 
is  obtained  by  instantiating  the  generic  concept  for  Larva. 

Temporal  relationships  concerning  the  life  stages  of  the  insects  are  implicit  in  this 
sentence.  Specifically,  it  is  the  adult  developmental  stage  of  T.  julis,  rather  than 
some  other  stage  such  as  eggs  or  pupae,  that  does  the  parasitizing  of  the  larvae.  It  is 
the  larval  stage  of  cereal  leaf  beetle,  rather  than  some  other  stage,  that  is  attacked. 
Background  knowledge  about  the  life  history  of  insects  is  needed  to  understand  these 
relationships.  Specifically,  all  insects  begin  as  eggs,  and  eggs  hatch  into  larvae  which 
progress  through  several  stages  or  instars.  Then  they  enter  a  pupal  stage,  after  which 
adults  emerge  and  eventually  lay  eggs. 

In  a  similar  fashion,  other  factual  and  temporal  information  described  in  the 
paragraph  can  be  represented  in  predicate  form: 
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Overwinter(Larvcie,  In  Soil) 

(In  Cereal  leaf  beetle  Pupae) 

BeJn(5  parasite  larvae,  One  Cereal  leaf  beetle  Pupa) 

Begin(Life,  Cereal  leaf  beetle  Pupa) 

(Time(Late  June)) 

Become(T.  julis  Larvae,  T.  julis  Adults) 

(Time(Late  May)) 

Emerge(Adult  T.  julis,  From  Pupa,  From  Soil) 

(Time(Late  May)) 

Disperse(Adult  T.  julis,To  Grain  Fields) 

(Time(Late  May)) 

Search_for( Adult  T.  julis.  Cereal  leaf  beetle) 

(Location(Spring  Grains)) 

Notice  that  factual  information  such  as  "5  parasite  larvae  per  pupae"  is  directly 
associated  with  other  information.  Properties  associated  with  each  concept  are  con- 
nected structurally,  and  these  properties  can  include  both  factual  and  temporal  rela- 
tionships. For  example,  all  the  information  associated  with  the  adult  T.  julis  can  be 
expressed  by  the  object: 

T.  julis 

SUPERCLASSES:  Insect,  Parasite 
ATTRIBUTES 

Growth  Stage:  Adult 
Host:  Cereal  leaf  beetle 

Behavior:  Emergence,  Mating,  Dispersion,  Searching, 
Parasitize,  Oviposition 

where  Insect,  Parasite,  Adult,  Cereal  leaf  beetle,  Emergence,  and  so  forth,  are  also 
complex  objects. 

Since  the  objects  associated  with  Behavior  are  actions  and  events  and  therefore 
occur  in  time,  the  temporal  events  can  be  highlighted  as  shown  in  Figure   6.7  which 
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Figure  6.7:  Semantic  Network  for  Temporal  Events  in  the  Life  History  of  T.  julis 

emphasizes  the  dynamic  processes  that  are  occurring.  This  resembles  a  state  diagram 
except  that  the  objects  not  only  represent  state  variables  but  also  contain  all  the 
information  associated  with  each  concept. 

Notice  that  time  is  being  represented  as  multiple  states  by  using  different  ob- 
jects. Thus,  several  objects  are  used  to  represent  the  life  stages  of  each  insect.  There 
are  three  different  objects  for  representing  T.  julis  adults.  One  object  represents  the 
newly  emergent  adult  which  is  in  the  mating  stage,  another  represents  an  adult  which 
is  exhibiting  dispersion  activity,  and  another  represents  a  later  stage  for  ovipositing 
adults.  This  notion  of  time  involves  objects  "predicting"  their  next  state  by  pointing 
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to  another  object.  Although  it  is  based  only  on  the  information  expressed  in  the  nat- 
ural language  description,  this  way  of  expressing  temporal  relationships  is  consistent 
with  later  stages  of  modehng  in  which  each  life  stage  is  represented  by  a  state  vari- 
able, and  difference  equations  represent  the  time  progression  of  the  insect  population 
through  life  stages. 

Even  with  the  crude  model  of  Figure  6.7,  which  contains  only  structural  relation- 
ships and  no  quantitative  information,  questions  such  as  "When  will  T.  julis  emerge," 
or  "When  will  Cereal  leaf  beetle  pupate,"  or  "How  many  parasites  are  there  per  pu- 
pae" can  be  answered.  More  specific  answers  can  be  given  as  more  detailed  knowledge 
is  added,  but  this  is  a  natural  result  of  refining  the  model. 

Model  refinement  [26]  occurs  as  more  information  is  added  to  the  representation. 
For  example,  experiments  were  conducted  to  determine  the  rate  of  development  of 
T.  julis  larvae  as  a  function  of  temperature.  Again,  background  knowledge  of  insect 
development  reveals  that  development  rate  is  a  linear  function  of  temperature  above 
some  minimum  threshold.  Suppose  it  is  learned  that: 

"A  developmental  threshold  was  estimated  from  data  using  regression  to 
determine  the  threshold." 

As  a  result  of  the  experimental  data  and  regression  analysis,  the  following  equation 
for  development  was  obtained  where  the  development  rate  y  is  expressed  as  a  function 
of  the  temperature  x: 

_  j  -12.8  + 0.27a;    if  a;  >  48 
^  ~  \  0  X  <=  48 

This  information  can  now  be  associated  with  T.  julis  larva  by  attaching  it  to  the 
object  shown  in  Figure  6.8  where  the  equation  has  been  converted  to  an  object, 
Threshold-Proportional.    Such  information  could  be  used  to  reason  more  precisely 
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T.  julis 

SUPERCLASSES:  Insect,  Parasite 
ATTRIBUTES 

Development  Stage:  Larva 
Host:  Cereal  leaf  beetle 
Development  Rate: 

Threshold-Proportional 
Varl:  Y 

Represents:  Development  Rate 
Var2:  Sum 

Offset:  -12.8 
Multiply 
Varl 

Constant:  0.27 
Var2:  X 

Represents:  Temperature 
Threshold:  Value:  48 
Variable  X 

Represents:  Temperature 

Figure  6.8:  Object  for  T.  julis  Containing  Development  Rates 
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about  when  events  in  the  Hfe  history  of  T.  julis  will  occur.  The  utiKzation  of  quanti- 
tative information  is  examined  in  more  detail  in  the  next  section. 
6.5   Models  Capturing  Quantitative  Relationships 

Another  example  comes  from  a  one-paragraph  description  of  a  nanoplankton  res- 
piration model  taken  from  a  textbook  on  biological  control  systems  [77]: 

"The  sunlight  is  the  input  to  the  system  and  is  represented  by  a  battery 
of  voltage  et.  The  production  rate  /  of  material  by  photosynthesis  is 
proportional  to  the  difference  between  et  and  a  "backup"  potential  e,  of 
material  in  the  system,  with  the  constant  of  proportionahty  being  looked 
upon  as  the  battery  conductance  l/Rb-  The  community  respiration  rate 
fr  is  assumed  proportional  to  the  potential  e,  and  the  storage  rate  fc 
proportional  to  the  rate  of  change  of  potential  in  the  community  cellular 
storage  capacity  C.  Finally,  the  total  production  rate  /  must  equal  the 
sum  of  respiration  and  storage  rates." 

A  system  called  NATSIM  was  constructed  that  can  generate  mathematical  equa- 
tions based  on  the  descriptions  given  in  this  paragraph.  For  example,  the  sentence: 

"The  community  respiration  rate  fr  is  assumed  proportional  to  the  po- 
tential e." 

has  the  predicate  form: 

proportional[assumed,  to(e  [potential])] 

(fr  [rate  [respiration  [conununity]]]) 

The  predicate  is  used  to  instantiate  the  concept  "Proportional": 

Proportional 

SUPERCLASSES:  Equal 
ATTRIBUTES 
Varl:  Variable 

Symbol:  Fr 
Represents:  Rate 

Type:  Respiration 

Of:  Community 
Var2:  Multiply 

Varl:  Constant 

Symbol:  Cl 
Var2:  Variable 

Symbol:  e 
Represents:  Potential 


168 
Here  the  generic  notion  of  Proportional  implies  that  a  variable  is  equal  to  the  product 
of  a  constant  (the  constant  of  proportionality)  by  another  variable.  Multiply  is  a 
binary  relationship  between  two  variables.  Notice  that  as  in  the  previous  example 
additional  information  is  associated  with  the  mathematical  terms,  namely  that  fr 
represents  the  rate  of  community  respiration  and  that  e  is  a  potential. 

The  background  knowledge  needed  to  understand  this  paragraph  is  shown  in  the 
concept  generalization  hierarchy  of  Figure  6.9.  Notice  that  certain  classes  represent 
mathematical  operations.  For  example,  Proportional  is  a  special  case  of  Equal.  That 
is,  an  equation  of  proportionahty  is  a  special  kind  of  equation.  In  addition,  there 
are  classes  describing  electrical  components  and  power  sources.  These  are  used,  for 
example,  to  understand  the  analogy  between  a  battery  and  sunlight.  Also,  concepts 
dealing  with  biological  processes  such  as  photosynthesis  and  respiration  are  needed 
to  understand  the  references  in  the  paragraph. 

The  generation  of  mathematical  equations  is  treated  as  a  language  translation 
problem.  Just  as  objects  are  created  through  parsing  English  sentences  and  instan- 
tiating generic  concepts,  sentences  in  any  language  can  be  generated  by  a  reverse 
process.  A  simple  context-free  grammar  was  used  for  generating  mathematical  equa- 
tions from  objects  of  class  Equal: 

Equal  —^   exp  =  exp. 

exp  —y   exp  +  exp. 

exp  — »   exp  —  exp. 

exp  —y   exp  exp. 

exp  — >   exp/exp. 

exp  — >   dexp/dvar. 

exp  —y   (exp). 

exp  —y   var. 

exp  —y   numb. 

For  example,  the  object  for  Proportional  given  earlier  can  be  used  in  this  derivation: 
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FRAME  BROWSER  I  (parei 


-->chi Idren) 


thing  j 


ACCEPT  I   I  REJECT |   |  UPDATE  |  |  DELETE  | 


^ equal 


proportional 


constant  pouersource - 


-battery 
-sunlight 


defined 
binary  operation 


van 

exact  1  CLASS  frame:  quantity 
var2 

exact  1  CLASS  frame:  quantity 


binary  operation 


production 

photosynthesis 

respiration 


material 


ACCEPT 


REJECT 


UPDATE 


DELETE 


defined 
WMil 
equal 


3LIBFRAMES 
INSTANCES 
SLOTS 


van 

exact  1  CLASS  frame:  quantity 
var2 

exact  1  CLASS  tuple: 


•  Input 

^monotonic 

-min-max-min 

^max^min-max 

aperiodic 


multiply 


varl 
exact  1  CLASS  frame:  constant 

var2 
exact  1  CLASS  frame:  quantity 


Figure  6.9:  Concept  Generalization  Hierarchy  Showing  Sample  Objects 
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Proportional   — > 
exp  =  exp  — > 
var  =  exp  — > 
fr  =  exp  — » 
fr  =  exp  X  exp  —* 
fr  =  var  x   exp  —* 
fr  =  CI   X   exp  — > 
fr  =  Cl   X   var  ^ 
fr  =  C\   X   e. 

This  derives  the  equation  form  for  the  sentence  about  community  respiration  rate. 

Once  the  equations  have  been  produced,  behavioral  analysis  can  be  conducted 
with  the  help  of  a  natural  language  query  to  answer  questions  such  as  "Is  this  a 
stable  system?"  or  "What  is  the  time  constant  for  the  transient  behavior?"  These 
can  be  answered  by  sending  the  system  equations  to  a  simulator.  The  simulator  could 
be  coupled  with  a  control  systems  analysis  expert  system  [77]. 

This  example  illustrates  that  mathematical  expressions  can  merge  directly  with 
natural  language  expressions.  Once  again,  background  knowledge  needed  to  fully 
understand  the  system  under  study  must  be  expHcitly  represented.  In  this  example, 
the  mechanics  of  solving  the  equations  still  had  to  be  handled  by  an  external  process. 

6.6   Natural  Language  Generation 

There  are  many  situations  in  which  a  qualitative,  natural  language  description 
of  simulation  behavior  and  results  is  highly  desirable.  This  is  especially  true  at 
run-time  where  simulation  users  are  decision  makers  that  are  unconcerned  or  not 
knowledgeable  about  model  details.  Generating  such  descriptions  is  the  reverse  of 
the  NL  understanding  process. 

Consider  the  predator/prey  model: 

dxi/dt  =  kiXi  —  k2XiX2 

dX2/dt  =  k3XiX2 


Predator/Prey  Model 
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Prey  Increases  Exponentially 
(x2=0orkl»k2) 


Stable  Oscillations 


Prey  Extinction 
(k2,k3»kl) 


Figure  6.10:  Category  Structure  for  Predator/ Prey  Model 

where  Xi  is  the  prey  population  density,  X2  is  the  predator,  ki  and  h  are  the  prey 
and  predator  population  growth  rate,  respectively,  and  k2  is  the  predation  rate.  The 
basic  model  along  with  several  modes  of  behavior  are  represented  by  the  category 
structure  shown  in  Figure   6.10. 

An  example  of  a  class  description  that  would  be  included  in  the  "prey  increases 

exponentially"  category  would  be: 

Prey_Increases_Exponentially 

SUPERCLASSES:  Population  Explosion,  Exponential  Growth 
SUBCLASSES:  Predator  Extinct,  Low  Levels  of  Predators 
ATTRIBUTES 

Quantitative  Parametric  Condition:  Kl  })  K2 

Qualitative  Condition:  Prey  Population  Growth  Rate  ))  Predation  Rate 

Dynamic  Behavior:  Rapid  Increase  (X2) 

Naturally  there  could  be  many  other  classes  representing  the  different  types  of  preda- 
tor/prey models,  different  qualitative  behaviors,  and  different  explanations  for  ob- 
served population  dynamics. 

A  particular  simulation  run  would  automatically  be  classified  below  one  of  the 
classes  in  Figure   6.10.  Once  the  behavior  hcis  been  classified,  it  can  be  summarized 
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and  explained  to  the  user  at  several  levels.  Certainly  the  raw  simulation  output  or  an 
analytical  solution  could  be  displayed.  But  in  their  pure  mathematical  form,  these 
would  not  include  any  interpretation.  Another  possibility  is  a  template  response: 

The  prey  population  XI  is  exploding  because: 

a)  The  predation  rate  of  X1-X2  is  too  low 

b)  The  predator  population  X2  is  too  low 

c)  The  predator  population  X2  is  extinct. 

Actual  descriptions  for  XI  and  X2  would  be  filled  in  depending  on  the  particular 
system  under  study,  and  a,  b,  or  c  would  be  selected  based  on  further  classification 
of  the  simulation  results. 

In  more  complex  situations,  a  more  general  natural  language  generation  system 
would  be  required  to  produce  useful  interpretations  automatically.  Consider  the  fol- 
lowing observation: 

Variable  X2 

Represents:  Predator  Population  Level 
Value  at  T=35:  1.5 
Value  at  T=40:  0 

A  number  of  summarizing  statements  may  be  made  about  this  situation.  The  most 
literal  statement  is  generated  from  an  exact  interpretation: 

Variable  X2  have  value  at  T=35  of  1.5,  and  value  at  T=40  of  0. 
Such  a  statement  is  generated  by  a  mapping  of  the  semantic  structure  down  through 
grammar  rules.  For  example,  the  verb  "have"  is  a  linking  verb  connecting  an  entity 
with  its  properties: 

<  Entity  >   have  <  Property!  >    [,and  <  Property!  >]* 


tl.: 
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Now  other  interpretations  can  be  substituted  since  a  mathematical  variable  in 

a  simulation  has  an  interpretation  as  representing  some  entity.  The  entities  can  be 

substituted  for  the  variables: 

Predator  population  level  have  a  value  at  35  weeks  of  1.5, 
and  value  at  40  weeks  of  0. 

Now  this  phrase  can  be  subsumed  by  a  category  containing  situations  in  which  some 
value  is  decreasing  to  zero,  enabling  the  substitution: 

Predator  population  level  decrease  to  zero  at  40  weeks. 
Also,  since  this  form  of  "level  decreasing  to  zero"  is  for  a  particular  entity,  namely  a 
population,  it  is  subsumed  by  the  concept  extinction: 

Predator  population  become  extinct  at  40  weeks. 
The  remaining  step  is  to  clean  up  this  expression.  One  approach  is  to  appeal 
to  rules  for  grammatical  form,  such  as  subject/verb  agreement  and  tense.  Thus 
"become"  is  switched  to  "became."  Although  this  fixes  the  granunar,  the  sentence 
could  be  made  even  smoother  by  appealing  to  the  case-base  of  expressions,  in  which 
it  is  discovered  that  it  is  more  common  to  say: 

The  predator  population  became  extinct  at  40  weeks. 

6.7   Conclusions 

These  examples  have  illustrated  the  importance  of  models  in  understanding  and 
interpreting  language  expressions.  The  examples  covered  use  of  models  for  spatial  and 
temporal  reasoning,  understanding  quantitative  relationships,  and  interpreting  model 
behavior  through  language  generation.  Through  stepwise  transformation,  natural 
language  expressions  are  mapped  into  formal  models,  and  vice  versa. 

Notice  in  these  examples  that  the  natural  language  system  is  tightly  coupled  with 
the  qualitative  model.  In  order  to  generate  richness  of  expression  and  interpretation, 
the  structures  used  for  representing  the  model  must  be  fully  compatible  and  directly 
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integrated  with  the  structures  for  representing  language  semantics.  Thus,  coupling 
natural  language  with  simulation  is  not  viewed  as  an  interface  problem.  Loose-coupled 
approaches  are  also  possible,  but  would  not  be  capable  of  the  same  level  of  reasoning. 
The  importance  of  categories  and  the  role  of  models  in  forming  categories  have 
also  been  illustrated.  In  the  examples,  the  hierarchical  structure  of  the  model  base  is 
seen  to  facihtiate  reasoning.  Qualitative  simulation  models  and  their  components  act 
as  cognitive  models  for  determining  the  membership  of  model  instances  and  experi- 
mental observations  and  interpreting  natural  language  expressions.  The  examples  do 
not  show  the  use  of  category  theory  for  dealing  with  exceptions  and  explaining  new 
and  unusual  experimental  observations.  This  is  directly  related  to  scientific  discov- 
ery and  is  an  area  which  shows  potential  for  future  exploration.  This  chapter  was 
intended  to  illustrate  ideas  and  approaches.  Much  work  lies  ahead  in  refinement  and 
implementation. 


CHAPTER  7 
CONCLUSIONS  AND  FUTURE  DIRECTIONS 

It  is  hoped  that  the  work  presented  here  has  put  terminological  knowledge  rep- 
resentation on  a  more  accurate,  justifiable,  and  sohd  basis.  Building  categories  is 
a  fundamental  cognitive  process.  What  is  known  about  that  process  must  guide 
development  of  formal  representational  systems. 

The  work  presented  here  has  its  greatest  implications  for  databases  and  informa- 
tion retrieval  systems.  The  results  can  be  appUed  to  data  modeling,  schema  genera- 
tion, and  query  specification  and  processing.  Efforts  must  continue  in  all  these  areas. 
Queries  based  on  incomplete  information,  and  queries  which  retrieve  indirect  infor- 
mation such  as  analogies  will  be  possible  using  the  conceptual  clustering  algorithm. 

The  immediate  need  is  to  implement  the  new  functions  (INTERSECT,  Exception 
Condition,  EVOLVE,  Default  Value,  and  Prototype).  Work  on  CANDIDE  has  al- 
ready progressed  to  the  point  described  in  Chapter  2,  and  these  new  capabilities  can 
now  be  added.  Work  is  also  proceeding  on  an  information  retrieval  system  based  on 
CANDIDE,  including  a  multimedia  user  interface  based  on  semi-automatic  genera- 
tion of  displays.  A  number  of  projects  are  underway  building  real  databases  using 
CANDIDE. 

The  performance  and  computational  complexity  of  real  systems  based  on  the  tech- 
niques presented  here  must  be  evaluated.  Now  that  a  formal  algorithm  for  conceptual 
clustering  has  been  specified,  work  can  be  done  analyzing  this  complexity.  Of  partic- 
ular interest  would  be  the  use  of  parallel  processing  algorithms  in  conjunction  with 
Intersection.  At  the  same  time,  the  practical  usefulness  of  these  algorithms  can  only 
be  determined  in  field  tests  against  real  databases. 
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Lexical  acquisition  experiments  are  already  proceeding.  It  is  expected  that  three 
phases  will  occur.  First,  the  lexicon  must  be  constructed  by  hand.  Eventually  phase 
two  will  occur  in  which  the  lexical  acquisition  algorithm  begins  to  recognize  similar 
existing  instances  when  a  new  phrase  is  introduced.  In  the  third  phase,  the  algorithm 
is  able  to  provide  an  interpretation  for  a  new  and  unfamiliar  phrase.  It  is  unclear 
how  long  phase  one  will  last,  but  there  is  already  some  evidence  that  phase  two  is 
beginning  to  work  on  some  existing,  well  established  lexical  entries.  But  the  open 
question  remains:  How  much  background  knowledge  is  needed  before  the  system  can 
learn  new  phrases  (phase  three)?  This  can  only  be  answered  by  experimentation,  and 
that  will  take  some  time. 

Natural  language  processing  as  discussed  here  has  been  limited  to  simple  question 
answering,  although  that  much  would  be  extremely  useful  for  information  retrieval. 
There  are  certainly  many  other  areas  of  natural  language  which  must  be  explored. 
No  work  has  been  done  within  this  project  on  language  generation.  That  would  be 
a  valuable  feedback  mechanism  to  test  whether  lexical  knowledge  had  been  acquired 
successfully.  Ultimately,  the  system  could  be  used  for  processing  and  extracting 
knowledge  from  text.  That  would  require  considerable  work  in  dialogue  processing 
and  require  new  facihties  such  as  reference  resolution. 

The  importance  of  cognitive  models  in  building  categories  and  in  language  un- 
derstanding has  been  demonstrated.  Much  work  needs  to  be  done  on  developing 
and  incorporating  new  structures  for  such  models.  The  use  of  qualitative  models  in 
language  understanding  and  reasoning  showed  some  interesting  possibilities.  Finally, 
although  an  explanation  has  been  provided  for  generation  of  empirical,  inductive  mod- 
els (the  original  PRIMITIVE  classes),  it  is  now  unclear  how  more  complex  models 
(the  original  DEFINED  classes)  can  be  generated. 
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